Citatio
n
:
Del
g
a
d
illo, J.
;
Ki
n
y
u
a, J.
;
Mu
ti
gw
e, C. Fi
nS
o
S
e
n
t
:
A
d
va
n
ci
n
g
Fi
n
a
n
cial
M
a
r
ket
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
t
hr
o
u
g
h
P
r
et
r
ai
n
e
d
La
r
g
e La
n
g
u
a
g
e
M
o
d
el
s
.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024,
8, 87.
h
tt
ps
:
//
d
oi.o
r
g
/10.3390/
b
d
cc8080087
Aca
d
e
m
ic E
d
ito
r
:
Fab
r
izio
M
a
r
ozzo
Receive
d
:
18 J
un
e 2024
Revi
s
e
d
:
12 J
u
ly 2024
Acce
p
te
d
:
28 J
u
ly 2024
P
u
bli
sh
e
d
:
2 A
u
g
us
t 2024
C
o
p
y
r
ig
h
t:
©
2024
b
y t
h
e
a
u
t
h
o
r
s
.
Li
c
e
ns
ee
M
D
P
I,
Bas
el,
Sw
itze
r
l
an
d
.
T
h
i
s
a
r
ti
c
le i
s
an
o
p
e
n
acc
e
ss
a
r
ti
c
le
d
i
s
t
r
i
b
u
te
d
u
n
d
e
r
t
h
e te
r
ms
an
d
c
o
n
d
itio
ns
o
f
t
h
e C
r
e
a
tive Co
mm
o
ns
Att
r
i
b
u
tio
n
(
CC
B
Y
)
li
c
e
ns
e
(h
tt
p
s
://
c
r
e
a
tive
c
o
mm
o
ns
.o
r
g/li
c
e
ns
e
s
/
b
y/
4.0/
)
.
big data and
cognitive computing
A
r
ticle
Fi
n
SoSe
n
t
:
A
d
va
n
ci
ng
Fi
n
a
n
cial
M
a
r
ket Se
n
ti
m
e
n
t A
n
aly
s
i
s
t
h
r
o
ugh
P
r
et
r
ai
n
e
d
La
r
g
e La
ngu
a
g
e
M
o
d
el
s
Jo
s
iel Del
g
a
d
illo
1,
*
,
Jo
hn
s
o
n
Ki
n
y
u
a
2
a
nd
C
h
a
r
le
s
Mu
ti
gw
e
3
1
S
c
h
ool o
f
E
n
g
i
n
ee
r
i
n
g
a
n
d
A
pp
lie
d
S
cie
n
ce
s
, U
n
ive
rs
ity o
f
P
e
nns
ylva
n
ia,
P
h
ila
d
el
ph
ia,
P
A 19104, U
S
A
2
Coll
e
g
e
o
f
I
nf
o
rm
atio
n
S
ci
e
n
c
e
s
a
nd
T
e
c
hn
olo
g
y,
P
e
nns
ylva
n
ia
S
tat
e
U
n
iv
e
rs
ity,
P
h
ila
d
e
l
ph
ia,
P
A 19104,
U
S
A
;
j
d
k450
@
psu
.e
d
u
3
Colle
g
e o
f
Bus
i
n
e
ss
,
W
e
s
te
rn
Ne
w
E
n
g
la
n
d
U
n
ive
rs
ity,
Spr
i
n
g
fiel
d
,
M
A 01119, U
S
A
;
c
h
a
r
le
s
.
mu
ti
gw
e
@
w
n
e.e
d
u
*
Co
rr
e
sp
o
n
d
e
n
ce
:
jo
s
iel
d
@
up
e
nn
.e
d
u
A
bs
t
rac
t
:
Pr
e
d
i
c
ti
n
g t
h
e
d
i
r
e
c
tio
ns
o
f
fi
nanc
i
a
l
ma
r
ket
s
has
b
ee
n
p
e
r
f
o
r
m
e
d
u
s
i
n
g
a
v
a
r
iety o
f
a
ppr
oac
h
e
s
, a
n
d
t
h
e
la
r
g
e
vol
um
e
o
f
uns
t
ru
ct
ur
e
d
d
ata
g
e
n
e
r
at
e
d
by t
r
a
d
e
rs
a
n
d
ot
h
e
r
s
tak
e
h
ol
d
e
rs
o
n
s
ocial
m
e
d
ia
m
ic
r
oblo
g
p
lat
f
o
rms
pr
ovi
d
e
s
un
iq
u
e
o
pp
o
r
t
un
iti
e
s
f
o
r
a
n
alyzi
n
g
fi
n
a
n
cial
m
a
r
k
e
t
s
u
s
i
n
g
a
dd
itio
na
l
p
e
r
s
p
e
c
tive
s
.
Pr
et
r
a
i
n
e
d
l
a
r
ge l
an
g
u
a
ge
m
o
d
el
s
(
LL
Ms)
ha
ve
d
e
m
o
ns
t
r
a
te
d
ve
r
y
g
oo
d
p
e
rf
o
rm
a
n
c
e
o
n
a va
r
i
et
y o
f
s
e
n
t
i
m
e
n
t
a
n
alysis
t
as
k
s i
n
d
i
ff
e
r
e
n
t
d
o
m
ai
n
s. Ho
w
e
v
e
r
, i
t
is
k
n
o
wn
t
h
a
t
s
e
n
t
i
m
e
n
t
a
n
alysis is a v
e
r
y
d
o
m
ai
n-d
e
p
e
nd
e
n
t
NL
P
t
as
k
t
h
a
t
r
e
q
u
i
r
e
s
k
n
o
w
l
e
d
g
e
o
f
t
h
e
d
o
m
ai
n
o
n
tology,
an
d
t
h
i
s
i
s
p
a
r
ti
c
u
l
a
r
ly t
h
e
cas
e
w
it
h
t
h
e fi
nanc
i
a
l
d
o
ma
i
n
,
w
h
i
ch
u
s
e
s
it
s
o
w
n
u
n
i
q
u
e
vocab
u
la
r
y. Rece
n
t
d
evelo
pm
e
n
t
s
i
n
NL
P
a
n
d
d
ee
p
lea
rn
i
n
g
i
n
cl
u
d
i
n
g
LL
Ms
h
ave
m
a
d
e it
p
o
ss
ible
t
o
g
e
n
e
r
a
te
ac
t
io
n
abl
e
fi
n
a
n
cial s
e
n
t
i
m
e
n
t
s
u
si
n
g
mu
l
t
i
p
l
e
so
ur
c
e
s i
n
cl
ud
i
n
g
fi
n
a
n
cial
n
e
w
s, co
mp
a
n
y
f
u
n
d
am
e
n
t
a
l
s
, te
chn
i
ca
l i
n
d
i
ca
to
r
s
,
as
w
ell
s
o
c
i
a
l
m
e
d
i
a
m
i
c
r
o
b
log
s
p
o
s
te
d
o
n
p
l
a
t
f
o
r
ms
s
u
ch
as
S
to
c
kT
w
it
s
an
d
X
(f
o
r
m
e
r
ly T
w
itte
r
)
.
W
e
d
evelo
p
e
d
a
fi
nanc
i
a
l
s
o
c
i
a
l
m
e
d
i
a
s
e
n
ti
m
e
n
t
ana
lyze
r
(
Fi
n
S
o
S
e
n
t
)
,
w
h
i
ch
i
s
a
d
o
ma
i
n-s
p
e
c
ifi
c
l
a
r
ge l
an
g
u
a
ge
m
o
d
el
f
o
r
t
h
e fi
nanc
i
a
l
d
o
ma
i
n
t
ha
t
w
as
pr
et
r
ai
n
e
d
o
n
fi
n
a
n
cial
n
e
w
s
a
r
ticle
s
a
n
d
fi
n
e
-
t
u
n
e
d
a
n
d
te
s
te
d
u
s
i
n
g
s
eve
r
al fi
n
a
n
cial
s
ocial
m
e
d
ia
c
o
rp
o
r
a
.
W
e
c
o
n
du
c
te
d
a
l
a
r
ge
n
u
mb
e
r
o
f
ex
p
e
r
i
m
e
n
t
s
u
s
i
n
g
d
i
ff
e
r
e
n
t le
a
r
n
i
n
g
r
a
te
s
, e
p
o
chs
,
an
d
batc
h
s
iz
e
s
to yi
e
l
d
t
h
e
b
e
s
t
p
e
rf
o
rm
i
n
g
m
o
d
e
l.
O
ur
m
o
d
e
l o
u
t
p
e
rf
o
rms
c
urr
e
n
t
s
tat
e
-
o
f-
t
h
e
-
a
r
t F
S
A
m
o
d
e
ls bas
e
d
o
n
ov
e
r
860
e
x
p
e
r
i
m
e
n
t
s,
d
e
m
o
n
s
t
r
a
t
i
n
g
t
h
e
e
f
ficacy a
nd
e
ff
e
c
t
iv
e
n
e
ss o
f
F
i
nS
o
S
e
n
t
.
W
e
also co
ndu
c
te
d
e
x
p
e
r
i
m
e
n
t
s
u
si
n
g
e
n
s
e
m
bl
e
m
o
d
e
ls co
mpr
isi
n
g
F
i
nS
o
S
e
n
t
a
nd
t
h
e
o
t
h
e
r
c
urr
e
n
t
s
t
a
te
-
o
f-
t
h
e
-a
r
t F
S
A
m
o
d
el
s
u
s
e
d
i
n
t
h
i
s
r
e
s
e
a
r
ch
,
an
d
a
s
lig
h
t
p
e
r
f
o
r
manc
e i
m
pr
ove
m
e
n
t
w
as
o
b
t
a
i
n
e
d
bas
e
d
o
n
ma
jo
r
ity voti
n
g.
Bas
e
d
o
n
t
h
e
r
e
s
u
lt
s
o
b
t
a
i
n
e
d
ac
r
o
ss
a
ll
m
o
d
el
s
i
n
t
h
e
s
e ex
p
e
r
i
m
e
n
t
s
, t
h
e
s
i
g
n
ifi
c
a
nc
e o
f
t
h
i
s
s
t
ud
y i
s
t
h
at it
h
i
g
h
li
g
h
t
s
t
h
e
f
a
c
t t
h
at,
d
e
s
p
ite t
h
e
r
e
c
e
n
t a
d
va
nc
e
s
o
f
LL
Ms
,
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
eve
n
i
n
d
o
m
ai
n-sp
ecific co
n
te
x
t
s
r
e
m
ai
ns
a
d
i
f
fic
u
lt
r
e
s
ea
r
c
h
pr
oble
m
.
Key
w
o
r
d
s
:
B
ERT
;
fi
n
a
n
cial
m
a
r
ket
s
;
T
w
itte
r
/X
;
S
tockT
w
it
s
;
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
;
LL
M
;
s
ocial
m
e
d
ia
1. I
n
t
r
o
du
ctio
n
T
h
e
pr
e
d
i
c
tio
n
o
f
t
h
e
pr
i
c
e
m
ove
m
e
n
t o
f
glo
ba
l fi
nanc
i
a
l
ma
r
ket t
r
e
n
d
s
,
c
o
rp
o
r
a
te
ea
rn
i
n
g
s
, a
n
d
fi
n
a
nc
ial i
ns
t
r
u
m
e
n
t
s
s
u
ch
a
s
s
to
c
k
s
i
s
c
o
ns
i
d
e
r
e
d
to
b
e a ve
r
y
ch
alle
n
g
i
n
g
ta
s
k b
e
ca
us
e
it
d
e
p
e
nds
o
n
a
mu
ltit
ud
e
o
f
co
mp
l
e
x
f
acto
rs
. T
h
e
s
e
f
acto
rs
i
n
cl
ud
e
e
co
n
o
m
ic
f
acto
r
s s
u
c
h
as
G
D
P
a
nd
i
n
t
e
r
e
st
r
at
e
s,
fund
a
m
e
n
tal i
nd
icato
r
s, t
e
c
hn
ical i
nd
icato
r
s,
p
olitical
e
v
e
n
t
s
,
e
x
c
h
a
n
g
e
r
at
e
s
, ot
h
e
r
e
x
t
e
rn
al
e
co
n
o
m
ic
f
acto
rs
,
e
tc.
F
und
a
m
e
n
tal i
nd
icato
rs
a
ss
e
ss
t
h
e
fi
n
a
n
cial sit
u
atio
n
o
f
a b
u
si
n
e
ss a
nd
t
h
e
i
n
t
r
i
n
sic val
u
e
o
f
its stoc
k
by a
n
alyzi
n
g
t
h
e
d
ata
abo
u
t t
h
e fi
rm’s
b
us
i
n
e
ss
m
o
d
el, a
n
d
t
h
e
r
e a
r
e
s
eve
r
al key i
n
d
icato
rs
su
c
h
a
s
ea
rn
i
n
g
s
p
e
r
sh
a
r
e
(
E
P
S)
, t
h
e
pr
ice
-
to
-
ea
rn
i
n
g
s
r
atio
(
P
/E
)
,
fr
ee ca
sh
flo
w
(
FCF
)
, t
h
e
pr
ice
-
to
-
book
r
atio
(P
/
B)
,
r
e
t
urn
o
n
e
q
u
ity
(
R
O
E
)
, a
nd
t
h
e
d
e
bt
-
to
-
e
q
u
ity
r
atio
(
D/E
)
[
1
]
.
B
y co
n
t
r
ast, t
e
c
hn
ical
i
n
d
i
ca
to
r
s
ana
lyze
p
as
t
ma
r
ket
d
a
t
a
s
u
ch
as
pr
i
c
e
an
d
vol
u
m
e, to
pr
e
d
i
c
t
f
u
t
ur
e
s
to
c
k
pr
ice
m
ove
m
e
n
t
s
w
it
h
t
h
e a
ssump
tio
n
t
h
at
p
a
s
t
s
tock
pr
ice be
h
avio
r
i
n
fl
u
e
n
ce
s
t
h
e
fu
t
ur
e
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87.
h
tt
ps
:
//
d
oi.o
r
g
/10.3390/b
d
cc8080087
h
tt
ps
:
//
www
.
m
d
p
i.co
m
/jo
urn
al/b
d
cc
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
2 o
f
21
m
a
r
ket evol
u
tio
n
[
1
]
. Co
mm
o
n
ly
u
s
e
d
tec
hn
ical i
n
d
icato
rs
i
n
cl
ud
e
m
ovi
n
g
ave
r
a
g
e
s
, t
h
e
r
e
lativ
e
st
r
e
n
g
t
h
i
nd
e
x
(
R
S
I
)
,
m
ovi
n
g
av
e
r
a
g
e
co
n
v
e
r
g
e
n
c
e
d
iv
e
r
g
e
n
c
e
(M
ACD
)
, o
n-
bala
n
c
e
vol
u
m
e
(
O
B
V
)
, t
h
e
s
to
chas
ti
c
o
sc
ill
a
to
r
,
B
olli
n
ge
r
ban
d
s
, t
h
e
a
ve
r
a
ge
d
i
r
e
c
tio
na
l i
n
d
ex
(
ADI
)
,
e
tc. T
h
e
t
e
c
hn
ical i
nd
icato
r
a
ppr
oac
h
to stoc
k
pr
ic
e
m
ov
e
m
e
n
t
pr
e
d
ictio
n
co
n
t
r
a
d
icts
t
h
e e
f
fi
c
ie
n
t
ma
r
ket
h
y
p
ot
h
e
s
i
s
(
E
M
H
)
[
2,3
]
. T
h
e E
M
H
p
o
s
t
u
l
a
te
s
t
ha
t it i
s
pr
ac
ti
ca
lly
i
mp
o
ss
ible to
pr
e
d
ict
fu
t
ur
e
pr
ice
m
ove
m
e
n
t
s
ba
s
e
d
o
n
t
h
e
h
i
s
to
r
ical
m
a
r
ket
d
ata beca
us
e
s
to
c
k
pr
i
c
e
m
ove
m
e
n
t
s
a
r
e l
a
r
gely
dr
ive
n
b
y
n
e
w
i
nf
o
r
ma
tio
n
,
f
ollo
w
a
r
an
d
o
m
w
a
lk
p
atte
rn
, a
n
d
d
o
n
ot
f
ollo
w
a
n
y
p
atte
rns
o
r
t
r
e
n
d
s
.
W
it
h
t
h
e
u
b
i
q
u
ito
u
s
a
v
a
il
ab
ility o
f
s
o
c
i
a
l
m
e
d
i
a
s
e
r
vi
c
e
s
to
d
a
y,
s
o
c
i
a
l
m
e
d
i
a
u
s
e
r
s
an
d
i
n
ve
s
to
r
s
p
o
s
t v
as
t
am
o
u
n
t
s
o
f
i
nf
o
r
ma
tio
n
ex
pr
e
ss
i
n
g
s
o
m
e o
p
i
n
io
n
o
r
s
e
n
ti
m
e
n
t
o
n
fi
nanc
i
a
l i
ns
t
ru
m
e
n
t
s
s
u
ch
as
s
to
c
k
s
. Re
a
l
-
ti
m
e
s
to
c
k t
r
a
d
i
n
g i
s
a
ve
r
y
d
y
nam
i
c
an
d
h
ig
h
ly
c
o
m
p
etitive
ac
tivity i
n
m
o
s
t fi
nanc
i
a
l
ma
r
ket
s
. I
n
ve
s
to
r
s
u
s
e
a
c
o
mb
i
na
tio
n
o
f
exte
r
na
l i
nf
o
r
ma
tio
n
an
d
i
n
te
r
na
l
c
o
m
p
an
y i
nf
o
r
ma
tio
n
to
ma
ke i
n
ve
s
t
m
e
n
t
d
e
c
i
s
io
ns
,
an
d
t
h
e
r
e
f
o
r
e, g
a
i
n
i
n
g
an
acc
ur
a
te vi
s
io
n
o
f
t
r
a
d
e
r
s’
o
p
i
n
io
ns
a
t
sca
le
can
give
a
t
r
a
d
e
r
an
a
d
v
an
t
a
ge i
n
ma
ki
n
g i
n
ve
s
t
m
e
n
t
d
e
c
i
s
io
ns
. T
h
e
s
e
n
ti
m
e
n
t
s
ex
pr
e
ss
e
d
i
n
n
e
w
s
an
d
s
ocial
m
e
d
ia t
w
eet
s
h
ave a
n
i
mp
act o
n
s
tock
pr
ice
s
, a
n
d
h
e
n
ce, co
ns
ta
n
t t
r
acki
n
g
o
f
t
h
e
s
e
s
e
n
ti
m
e
n
ts
h
as b
e
co
m
e
a
n
i
mp
o
r
ta
n
t activity
f
o
r
m
a
n
y i
n
v
e
sto
r
s s
u
c
h
as
m
ic
r
oblo
g
s
p
ost
e
d
o
n
s
o
c
i
a
l
m
e
d
i
a
p
l
a
t
f
o
r
ms
s
u
ch
as
T
w
itte
r
o
r
S
to
c
kT
w
it
s
[
4,5
]
,
w
h
e
r
e
u
s
e
r
s
ex
pr
e
ss
t
h
ei
r
o
p
i
n
io
ns
o
n
a
r
a
n
g
e o
f
to
p
ic
s
i
n
cl
u
d
i
n
g
s
tock
s
.
S
e
n
ti
m
e
n
t
ana
ly
s
i
s
i
n
t
h
e fi
nanc
i
a
l
d
o
ma
i
n
i
s
p
a
r
ti
c
u
l
a
r
ly
cha
lle
n
gi
n
g
b
e
ca
u
s
e t
h
i
s
d
o
m
ai
n
us
e
s
it
s
o
wn
ja
r
g
o
n
o
r
vocab
u
la
r
y,
wh
ic
h
, t
h
e
r
e
f
o
r
e
,
r
e
q
u
i
r
e
s
d
o
m
ai
n-sp
e
cific
s
e
n
ti
-
m
e
n
t a
n
aly
s
i
s
.
M
a
n
y
r
e
s
ea
rch
e
rs
h
ave
p
r
o
p
o
s
e
d
va
r
io
u
s
a
pp
r
oa
ch
e
s
t
h
at a
pp
ly
m
a
ch
i
n
e
lea
rn
i
n
g
[
6
]
, le
x
ico
n-
ba
s
e
d
a
pp
r
oac
h
e
s
to
g
et
h
e
r
w
it
h
m
ac
h
i
n
e lea
rn
i
n
g
[
7
]
,
d
ee
p
lea
rn
i
n
g
a
ppr
oac
h
e
s
[
8,9
]
, a
nd
n
at
ur
al la
n
g
u
a
g
e
pr
oc
e
ss
i
n
g
(
NL
P
)
t
r
a
nsf
o
rm
e
rs
[
10
]
to
d
e
r
iv
e
t
w
ee
t
s
e
n
ti
m
e
n
t
s
a
n
d
,
h
e
nc
e, t
h
e
d
i
r
e
c
tio
ns
o
f
s
to
c
k
m
ove
m
e
n
t
s
o
r
ot
h
e
r
fi
n
a
nc
ial i
ns
t
r
u
m
e
n
t
s
.
T
h
e
s
e
a
ppr
o
ach
e
s
can
b
e
u
s
e
d
to
m
o
n
ito
r
ma
r
ket
s
e
n
ti
m
e
n
t
s
ex
pr
e
ss
e
d
i
n
o
n
li
n
e
n
e
w
s
a
r
ti
c
le
s
an
d
/o
r
s
o
c
i
a
l
m
e
d
i
a
p
o
s
t
s
i
n
r
e
a
l ti
m
e
an
d
leve
r
a
ge t
h
o
s
e
s
e
n
ti
m
e
n
t
s
i
n
t
r
a
d
i
n
g
d
eci
s
io
ns
.
U
s
i
n
g
a
r
elate
d
a
ppr
oac
h
,
B
loo
m
be
r
g
r
e
p
o
r
te
d
t
h
at t
r
a
d
i
n
g
s
e
n
ti
m
e
n
t
p
o
r
t
f
olio
s
o
u
t
p
e
rf
o
rm
t
h
e
b
e
nchm
a
r
k i
n
d
ex
s
i
g
n
ifi
c
a
n
tly
[
11
]
. T
h
e
s
i
g
n
ifi
c
a
nc
e o
f
t
h
e
s
e a
pp
r
oa
ch
e
s
i
s
al
s
o
supp
o
r
te
d
by Tetlock
[
12
]
a
n
d
Tetlock et al.
[
13
]
,
w
h
o
r
e
p
o
r
t t
h
at
n
e
w
s
a
r
ticle
s
a
n
d
s
ocial
m
e
d
ia
s
e
n
ti
m
e
n
t
s
co
u
l
d
be
us
e
d
to
pr
e
d
ict
m
a
r
ket
r
et
urn
a
n
d
fi
rm
p
e
rf
o
rm
a
n
ce.
A
s
n
ote
d
a
b
ove, t
h
e
p
r
e
d
i
c
tio
n
o
f
t
h
e
p
r
i
c
e
m
ove
m
e
n
t o
f
fi
n
a
nc
ial i
ns
t
r
u
m
e
n
t
s
s
u
ch
as
s
to
c
k
s
i
s
a
ve
r
y
cha
lle
n
gi
n
g t
as
k
as
it
d
e
p
e
n
d
s
o
n
a
m
u
ltit
ud
e o
f
c
o
m
p
lex
fac
to
r
s
; it
i
s
, t
h
e
r
e
f
o
r
e,
n
ece
ss
a
r
y to i
n
te
g
r
ate
m
u
lti
p
le
d
ata
s
o
u
r
ce
s
s
u
c
h
a
s
f
u
n
d
a
m
e
n
tal i
n
d
icato
rs
,
t
e
c
hn
ical i
nd
icato
r
s, social
m
e
d
ia
p
osts, a
nd
ot
h
e
r
r
e
l
e
va
n
t fi
n
a
n
cial
n
e
w
s a
r
ticl
e
s
f
o
r
b
e
tt
e
r
p
e
r
f
o
r
manc
e. T
h
e o
bs
e
r
v
a
tio
ns
ab
ove
ha
ve
f
o
r
m
e
d
t
h
e
m
otiv
a
tio
n
f
o
r
t
h
i
s
r
e
s
e
a
r
ch
,
an
d
t
h
e objective
s
o
f
t
h
i
s
r
e
s
ea
r
c
h
a
r
e t
h
e
f
ollo
w
i
n
g
:
A
pp
ly
d
o
ma
i
n-s
p
e
c
ifi
c
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
to
d
evelo
p
LL
Ms
f
o
r
t
h
e
pr
e
d
i
c
tio
n
o
f
fi
n
a
n
cial i
ns
t
rum
e
n
t
s
us
i
n
g
mu
lti
p
le
d
ata
s
o
ur
ce
s
fr
o
m
t
h
e fi
n
a
n
cial
d
o
m
ai
n
.
E
nhanc
e t
h
e
m
o
d
el
’s
p
e
r
f
o
r
manc
e
b
y leve
r
a
gi
n
g
pr
et
r
a
i
n
i
n
g
an
d
fi
n
e
-
t
u
n
i
n
g
u
s
i
n
g
fi
n
a
n
cial co
rp
o
r
a
d
ur
i
n
g
m
o
d
el
d
evelo
pm
e
n
t.
Co
mp
a
r
e
t
h
e
p
e
rf
o
rm
a
n
c
e
o
f
t
h
e
m
o
d
e
l a
g
ai
n
st a s
e
t o
f
s
e
n
ti
m
e
n
t a
n
alyz
e
r
s,
wh
ic
h
co
n-
s
i
s
t
s
o
f
co
mm
e
r
cial
s
e
n
ti
m
e
n
t a
n
alyze
rs
, co
mm
e
r
cial
g
e
n
e
r
ative AI
m
o
d
el
s
, aca
d
e
m
ic
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
m
o
d
el
s
, a
n
d
o
p
e
n-s
o
ur
ce
s
e
n
ti
m
e
n
t a
n
alyze
rs
.
I
n
o
ur
pr
evio
us
r
e
s
ea
r
c
h
,
w
e
d
evelo
p
e
d
Fi
nS
o
S
e
n
t, a
d
o
m
ai
n-sp
ecific la
n
g
u
a
g
e
r
e
pr
e
-
s
e
n
tatio
n
m
o
d
e
l
pr
e
t
r
ai
n
e
d
o
n
fi
n
a
n
cial co
rp
o
r
a
[
14
]
. I
n
t
h
i
s
s
t
ud
y,
w
e
r
e
p
o
r
t o
n
a
dd
itio
n
al
w
o
r
k
bas
e
d
o
n
ove
r
860 ex
p
e
r
i
m
e
n
t
s
,
as
ex
p
l
a
i
n
e
d
i
n
S
e
c
tio
n
3. T
h
e Fi
n
S
o
S
e
n
t
m
o
d
el
o
u
t
p
e
rf
o
rms
s
o
m
e o
f
t
h
e late
s
t la
r
g
e la
n
g
u
a
g
e
m
o
d
el
s
(
LL
Ms)
,
su
c
h
a
s
,
F
i
nB
ERT a
n
d
GP
T
-
3.5
-
T
ur
b
o 16K
(
r
ele
as
e
d
i
n
J
u
n
e 2023
)
, i
n
d
ete
c
ti
n
g t
h
e
s
e
n
ti
m
e
n
t o
f
s
o
c
i
a
l
m
e
d
i
a
p
o
s
t
s
.
Ho
w
eve
r
, t
h
e
m
o
d
el acc
ur
acy i
s
i
n
t
h
e 50–60
%
r
a
n
g
e,
w
h
ic
h
i
s
i
n
li
n
e
w
it
h
t
h
e fi
n
d
i
n
g
s
by
Zi
mb
r
a et al.
[
15
]
,
w
h
o, i
n
A
ugu
s
t 2018,
b
e
f
o
r
e t
h
e
r
elea
s
e o
f
t
h
e
B
ERT
m
o
d
el i
n
O
c
to
b
e
r
2018, eval
u
ate
d
28
s
tate
-
o
f-
t
h
e
-
a
r
t T
w
itte
r
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
s
y
s
te
ms
ac
r
o
ss
five
d
o
m
ai
ns
(s
ec
ur
ity,
r
etail, tec
hn
olo
g
y,
ph
a
rm
ace
u
tical
s
, a
n
d
teleco
mmun
icatio
ns)
a
n
d
f
o
un
d
t
h
at t
h
e
av
e
r
a
g
e
classificatio
n
acc
ur
aci
e
s o
f
t
h
e
s
e
syst
e
m
s
r
a
n
g
e
d
fr
o
m
40
%
to 71
%
. T
h
e
si
g
n
ifica
n
c
e
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
3 o
f
21
o
f
t
h
i
s
s
t
ud
y i
s
t
h
at it
h
i
g
h
li
g
h
t
s
t
h
at,
d
e
sp
ite t
h
e
r
ece
n
t a
d
va
n
ce
s
o
f
LL
Ms
, t
h
e
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
o
f
s
o
c
i
a
l
m
e
d
i
a
p
o
s
t
s
i
n
a
d
o
ma
i
n-s
p
e
c
ifi
c
c
o
n
text
r
e
ma
i
ns
a
d
i
f
fi
c
u
lt
r
e
s
e
a
r
ch
pr
oble
m
.
T
h
e
r
e
s
t o
f
t
h
i
s
p
a
p
e
r
i
s
o
r
g
a
n
ize
d
a
s
f
ollo
w
s
.
S
ectio
n
2 co
n
tai
ns
a lite
r
at
ur
e
r
evie
w
o
f
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
u
s
i
n
g
NL
P
a
n
d
d
ee
p
n
e
u
r
al
n
et
w
o
r
k
s
.
S
e
c
tio
n
3 ex
p
lai
ns
t
h
e
d
ata
s
et
s
a
nd
t
h
e
m
e
t
h
o
d
olo
g
y.
S
e
ctio
n
4 co
n
tai
n
s t
h
e
r
e
s
u
lts a
nd
a
n
alysis.
S
e
ctio
n
5
d
isc
u
ss
e
s
fu
t
ur
e
w
o
r
k a
n
d
co
n
cl
u
d
e
s
t
h
e
p
a
p
e
r
.
2. Relate
d
W
o
r
k
R
e
c
e
n
t
d
e
v
e
lo
pm
e
n
ts i
n
NL
P
,
d
ee
p
l
e
a
rn
i
n
g
, a
nd
t
r
a
n
s
f
e
r
l
e
a
rn
i
n
g
m
e
t
h
o
d
s
h
av
e
m
a
d
e
it
f
e
asibl
e
to
pr
o
du
c
e
actio
n
abl
e
fi
n
a
n
cial s
e
n
ti
m
e
n
ts
u
si
n
g
fi
n
a
n
cial t
e
x
ts
f
o
und
i
n
fi
n
a
n
cial
n
e
w
s
s
o
u
r
ce
s
a
n
d
o
n
s
ocial
m
e
d
ia
p
lat
f
o
rms
s
u
c
h
a
s
X
(f
o
rm
ally T
w
itte
r)
a
n
d
S
tockT
w
it
s
.
NL
P
t
e
c
hn
iq
u
e
s ca
n
b
e
u
s
e
d
to b
e
tt
e
r
und
e
r
sta
nd
t
h
e
la
r
g
e
bo
d
y o
f
pu
blis
h
e
d
fi
n
a
n
cial t
e
x
t
d
ata. I
n
p
a
r
ti
c
u
la
r
,
d
ee
p
lea
r
n
i
n
g
m
o
d
el
s
s
u
ch
c
o
n
vol
u
tio
n
al
n
e
ur
al
n
et
w
o
r
k
s
,
r
e
c
urr
e
n
t
n
e
ur
a
l
n
et
w
o
r
k
s
(
RNN
s)
,
an
d
a
tte
n
tio
n
m
e
chan
i
sms
a
r
e e
f
fi
c
ie
n
t
an
d
e
ff
e
c
tive
f
o
r
NL
P
t
as
k
s
b
e
ca
u
s
e t
h
ey
r
e
q
u
i
r
e
r
el
a
tively little
f
e
a
t
ur
e e
n
gi
n
ee
r
i
n
g,
a
lt
h
o
u
g
h
t
h
ey
r
e
q
u
i
r
e
a
la
r
g
e
co
rp
o
r
a o
f
t
r
ai
n
i
n
g
d
ata
[
16
]
. R
e
c
e
n
t
un
s
up
e
r
vis
e
d
pr
e
t
r
ai
n
i
n
g
o
f
la
n
g
u
a
g
e
m
o
d
e
ls o
n
la
r
g
e
co
rp
o
r
a,
su
c
h
a
s
bi
d
i
r
e
ctio
n
al
e
n
co
d
e
r
r
e
pr
e
s
e
n
tatio
ns
fr
o
m
t
r
a
nsf
o
rm
e
rs
(B
ERT
)
[
17
]
,
UL
M
Fit
[
18
]
, EL
M
o
[
19
]
, XLN
e
t, a
nd
G
P
T
[
20
]
h
av
e
m
a
d
e
si
g
n
ifica
n
t
p
e
rf
o
rm
a
n
c
e
i
mpr
ov
e
-
m
e
n
t
s
o
n
man
y NL
P
t
as
k
s
i
n
d
i
ff
e
r
e
n
t
d
o
ma
i
ns
s
u
ch
as
q
u
e
s
tio
n
ans
w
e
r
i
n
g,
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
, l
an
g
u
a
ge i
nf
e
r
e
nc
e, et
c
. I
n
m
o
s
t
cas
e
s
, t
h
e
s
e l
an
g
u
a
ge
m
o
d
el
s
a
r
e t
r
a
i
n
e
d
o
n
ge
n
e
r
a
l
d
o
ma
i
n
c
o
rp
o
r
a
s
u
ch
as
n
e
w
s
a
r
ti
c
le
s
,
b
ook
s
,
an
d
W
iki
p
e
d
i
a
,
w
h
i
ch
ma
y
n
ot
b
e
s
u
itabl
e
f
o
r
s
e
n
ti
m
e
n
t a
n
alysis tas
k
s i
n
t
h
e
fi
n
a
n
cial
d
o
m
ai
n
,
wh
ic
h
u
s
e
s its o
wn
vocab
u
la
r
y
o
f
te
rms
w
h
o
s
e
s
e
m
a
n
tic
s
a
r
e
d
i
ff
e
r
e
n
t, a
s
s
tate
d
ea
r
lie
r
.
It
h
as b
ee
n
r
e
p
o
r
t
e
d
t
h
at
pr
e
t
r
ai
n
i
n
g
a la
n
g
u
a
g
e
m
o
d
e
l
u
si
n
g
a
d
o
m
ai
n-
s
p
e
cific co
rpu
s
can
f
ur
t
h
e
r
i
m
pr
ove t
h
e t
as
k
p
e
r
f
o
r
manc
e
c
o
m
p
a
r
e
d
to fi
n
e
-
t
u
n
i
n
g
a
ge
n
e
r
i
c
l
an
g
u
a
ge
m
o
d
el
s
u
ch
as
B
ERT,
an
d
s
o
m
e
r
e
s
e
a
r
ch
e
r
s
ha
ve
u
s
e
d
t
h
i
s
a
ppr
o
ach
to
c
r
e
a
te
d
o
ma
i
n-
s
p
e
c
ifi
c
B
ERT
m
o
d
el
s
,
as
ex
p
l
a
i
n
e
d
b
r
iefly i
n
t
h
e
f
ollo
w
i
n
g ex
am
p
le
s
.
B
elt
a
gy et
a
l.
[
21
]
d
evelo
p
e
d
t
h
e
S
ci
B
ERT
m
o
d
el by
pr
et
r
ai
n
i
n
g
B
ERT
us
i
n
g
a la
r
g
e
mu
lti
-
d
o
m
ai
n
co
rpus
o
f
sc
ie
n
tifi
c
pu
b
li
ca
tio
ns
. Lee et
a
l.
[
22
]
d
evelo
p
e
d
a
b
io
m
e
d
i
ca
l
d
o
ma
i
n-s
p
e
c
ifi
c
l
an
g
u
a
ge
r
e
pr
e
s
e
n
t
a
tio
n
m
o
d
el
ca
lle
d
B
io
B
ERT
b
y
pr
et
r
a
i
n
i
n
g
B
ERT
u
s
i
n
g l
a
r
ge
-sca
le
b
io
m
e
d
i
ca
l
co
rp
o
r
a, a
n
d
H
u
a
n
g
et al.
[
23
]
d
evelo
p
e
d
Cli
n
ical
B
ERT by
pr
et
r
ai
n
i
n
g
B
ERT
w
it
h
cli
n
ical
n
ote
s
f
o
r
h
o
s
p
ital
r
ea
d
m
i
ss
io
n
pr
e
d
i
c
tio
n
ta
s
k
s
. Zi
mb
r
a et al.
[
15
]
c
o
n
du
c
te
d
a t
h
o
r
o
ug
h
st
ud
y a
nd
p
e
rf
o
rm
a
n
c
e
b
e
n
c
hm
a
r
k
e
val
u
atio
n
o
f
T
w
itt
e
r
s
e
n
ti
m
e
n
t a
n
alysis syst
e
m
s ac
r
oss
fiv
e
d
o
m
ai
ns
(s
e
c
ur
ity,
r
e
tail, t
e
c
hn
olo
g
y,
ph
a
rm
ac
e
u
tical
s
, a
nd
t
e
l
e
co
mmun
icatio
ns)
us
i
n
g
28
s
tate
-
o
f-
t
h
e
-
a
r
t
s
y
s
te
ms
. T
h
ey
r
e
p
o
r
te
d
t
h
at t
h
e
p
e
rf
o
rm
a
n
ce o
f
t
h
e
s
e
s
y
s
te
ms
r
e
m
ai
ns
r
at
h
e
r
p
oo
r
w
it
h
t
w
eet
s
e
n
ti
m
e
n
t
c
la
ss
ifi
c
atio
n
a
cc
u
r
a
c
ie
s
b
elo
w
70
%
. A
cc
o
r
d
i
n
g
to t
h
ei
r
s
t
ud
y, t
h
e
m
ai
n
c
h
all
e
n
g
e
s
i
mp
acti
n
g
t
h
e
acc
ur
acy o
f
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
s
y
s
t
e
ms
i
d
e
n
tifi
e
d
i
n
cl
ud
e
n
ov
e
l la
n
g
u
a
g
e
w
it
h
T
w
itt
e
r-
s
p
e
cific co
mmun
icatio
n
e
l
e
m
e
n
ts, t
h
e
b
r
e
vity o
f
t
w
ee
ts
(
140
cha
r
ac
te
r
s)
,
s
t
r
o
n
g
s
e
n
ti
m
e
n
t
c
l
ass
i
mba
l
anc
e
b
e
ca
u
s
e t
h
e
s
e
n
ti
m
e
n
t
ca
tego
r
ie
s
a
r
e
un
e
qu
ally
d
i
s
t
r
ib
u
te
d
i
n
d
ata
s
et
s
, a
n
d
s
t
r
ea
m-
ba
s
e
d
t
w
eet
g
e
n
e
r
atio
n
.
O
ve
r
all, acco
r
d
i
n
g
to t
h
i
s
s
t
ud
y, t
h
e
s
y
s
te
ms
p
e
r
f
o
r
m
e
d
p
oo
r
ly
w
it
h
a
w
i
d
e
r
an
ge o
f
a
ve
r
a
ge
c
l
ass
ifi
ca
tio
n
acc
ur
ac
ie
s
,
w
h
i
ch
r
an
ge
d
f
r
o
m
40
%
to 71
%
;
d
o
ma
i
n-s
p
e
c
ifi
c
a
ppr
o
ach
e
s
to
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
o
u
t
p
e
r
f
o
r
m
e
d
t
h
e ge
n
e
r
a
l
-
purp
o
s
e
a
ppr
o
ach
e
s
w
it
h
an
i
m
pr
ove
m
e
n
t o
f
11
%
.
T
h
i
s
o
u
t
c
o
m
e
s
upp
o
r
t
s
t
h
e vie
w
t
ha
t
d
o
ma
i
n-s
p
e
c
ifi
c
a
ppr
o
ach
e
s
to
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
a
r
e
r
e
qu
i
r
e
d
.
As
e
x
p
lai
n
e
d
abov
e
,
r
e
c
e
n
t
d
e
v
e
lo
pm
e
n
ts i
n
NL
P
,
d
ee
p
l
e
a
rn
i
n
g
, a
nd
t
r
a
n
s
f
e
r
l
e
a
rn
i
n
g
ha
ve
ma
d
e
s
ig
n
ifi
can
t i
m
pr
ove
m
e
n
t
s
i
n
t
h
e
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
o
f
fi
nanc
i
a
l
n
e
w
s
an
d
text
s
as
s
upp
o
r
te
d
b
y t
h
e
r
e
s
e
a
r
ch
u
n
d
e
r
t
a
ke
n
b
y Ag
a
i
an
an
d
Kol
m
[
24
]
,
Man
et
a
l.
[
25
]
,
Ya
n
g
et al.
[
26
]
,
an
d
Z
ha
o et
a
l.
[
10
]
. Fo
r
ex
am
p
le, t
h
e
w
o
r
k
d
i
sc
u
ss
e
d
b
y Z
ha
o et
a
l.
r
e
v
e
al
e
d
t
h
at
B
ERT a
nd
Ro
B
ERTa
h
av
e
s
up
e
r
io
r
p
e
rf
o
rm
a
n
c
e
i
n
fi
n
a
n
cial s
e
n
ti
m
e
n
t a
n
alysis
c
o
m
p
a
r
e
d
w
it
h
d
i
c
tio
na
r
y
-bas
e
d
m
o
d
el
s
. Ro
B
ERT
a
i
s
an
o
p
ti
m
ize
d
ve
r
s
io
n
o
f
B
ERT
r
et
r
a
i
n
e
d
o
n
a
d
a
t
as
et te
n
ti
m
e
s
b
igge
r
u
s
i
n
g
an
i
m
pr
ove
d
t
r
a
i
n
i
n
g
m
et
h
o
d
ology
an
d
d
i
ff
e
r
e
n
t
h
y
p
e
rp
a
r
a
m
e
t
e
r
s, a
nd
it
p
e
rf
o
rm
s b
e
tt
e
r
t
h
a
n
B
ERT o
n
m
a
n
y NL
P
tas
k
s i
n
cl
ud
i
n
g
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
4 o
f
21
text
c
l
ass
ifi
ca
tio
n
[
27
]
.
S
o
m
e
r
e
s
e
a
r
ch
e
r
s
ha
ve
u
s
e
d
a
s
i
m
il
a
r
a
ppr
o
ach
to
d
evelo
p
B
ERT
-
bas
e
d
m
o
d
e
ls i
n
t
h
e
fi
n
a
n
cial
d
o
m
ai
n
, a
nd
t
h
e
r
e
a
r
e
a
num
b
e
r
o
f
s
u
c
h
m
o
d
e
ls, as
e
x
p
lai
n
e
d
b
elo
w
. A
r
ac
i
[
28,29
]
d
evelo
p
e
d
a
Fi
nB
ERT
m
o
d
el
b
y
pr
et
r
a
i
n
i
n
g
B
ERT
w
it
h
a
fi
nanc
i
a
l
co
rpus
a
n
d
t
h
e
n
fi
n
e
-
t
un
i
n
g
it
us
i
n
g
a
sm
alle
r
fi
n
a
n
cial
d
ata
s
et
f
o
r
s
e
n
ti
m
e
n
t cla
ss
ificatio
n
i
n
t
h
e fi
nanc
i
a
l
d
o
ma
i
n
. To
c
r
e
a
te t
h
ei
r
Fi
nB
ERT
m
o
d
el, t
h
ey fi
r
s
t
pr
et
r
a
i
n
e
d
B
ERT
w
it
h
TRC2
-
fi
nanc
i
a
l,
w
h
i
ch
i
s
a
s
u
bs
et o
f
t
h
e Re
u
te
r
s
d
a
t
as
et
[
30
]
,
an
d
t
h
e
n
fi
n
e
-
t
u
n
e
d
u
s
i
n
g
t
h
e Fi
nanc
i
a
l
P
h
r
as
e
Ban
k
d
a
t
as
et
c
r
e
a
te
d
b
y
Ma
lo et
a
l.
[
31
]
. To v
a
li
d
a
te t
h
ei
r
Fi
nB
ERT,
t
h
ey i
m
p
le
m
e
n
te
d
ot
h
e
r
pr
et
r
a
i
n
e
d
l
an
g
u
a
ge
m
o
d
el
s
u
s
i
n
g EL
M
o, L
S
T
M
,
an
d
UL
M
Fit
f
o
r
fi
nanc
i
a
l
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
f
o
r
c
o
m
p
a
r
i
s
o
n
w
it
h
Fi
nB
ERT. T
h
ei
r
Fi
nB
ERT i
nc
r
e
as
e
d
t
h
e
c
l
ass
ifi
ca
tio
n
acc
ur
ac
y
b
y 15
%
c
o
m
p
a
r
e
d
w
it
h
t
h
e
s
e ot
h
e
r
m
o
d
el
s
. De
s
ol
a
et
a
l.
[
32
]
h
ave
d
evelo
p
e
d
t
h
r
ee
d
i
ff
e
r
e
n
t ve
r
s
io
ns
o
f
Fi
nB
ERT
b
y
pr
et
r
ai
n
i
n
g
B
ERT
w
it
h
10
-
K
S
EC
fili
n
g
s, t
h
e
n
t
e
sti
n
g
u
si
n
g
10
-Q
S
EC fili
n
g
s a
nd
e
a
rn
i
n
g
s call t
r
a
n
sc
r
i
p
ts. T
h
e
y
r
e
p
o
r
t
e
d
t
h
at
t
h
e
i
r
F
i
nB
ERT
m
o
d
e
l
s
o
u
t
p
e
rf
o
rm
B
ERT o
n
t
h
e
m
a
s
k
e
d
la
n
g
u
a
g
e
m
o
d
e
l a
nd
n
e
x
t
s
e
n
t
e
n
c
e
pr
e
d
ictio
n
ta
s
k
s
.
A
d
i
ff
e
r
e
n
t
F
i
nB
ERT
m
o
d
e
l
w
as
d
e
v
e
lo
p
e
d
by Li
u
e
t al.
[
33
]
by sta
r
ti
n
g
w
it
h
B
ERT a
nd
t
h
e
n
t
a
ki
n
g
B
ERT t
h
r
o
u
g
h
s
ix
s
el
f-s
up
e
r
vi
s
e
d
pr
et
r
a
i
n
i
n
g t
as
k
s
, t
h
e
n
fi
n
e
-
t
u
n
i
n
g it
w
it
h
tas
k
-
s
p
e
cific lab
e
l
e
d
fi
n
a
n
cial
d
ata. T
h
e
y sta
r
t
e
d
by
pr
e
t
r
ai
n
i
n
g
F
i
nB
ERT si
mu
lta
n
e
o
u
sly o
n
a
g
e
n
e
r
al co
rpu
s a
nd
a fi
n
a
n
cial
d
o
m
ai
n
co
rpu
s a
nd
t
h
e
n
m
ov
e
d
o
n
to t
h
e
fi
n
e
-
t
un
i
n
g
ph
as
e
,
w
h
e
r
e Fi
nB
ERT i
s
fi
r
s
t i
n
iti
a
lize
d
w
it
h
t
h
e
pr
et
r
a
i
n
e
d
p
a
r
am
ete
r
s
an
d
i
s
t
h
e
n
fi
n
e
-
t
u
n
e
d
o
n
ta
s
k
-s
p
ecific
s
up
e
r
vi
s
e
d
d
ata. T
h
e
p
r
et
r
ai
n
i
n
g
d
ata
s
et
s
u
s
e
d
a
r
e Fi
n
a
n
cial
W
eb
[
34,35
]
,
Ya
h
oo
!
F
i
n
a
n
c
e
[
36
]
, t
h
e
E
n
g
lis
h
W
i
k
i
p
e
d
ia a
nd
B
oo
k
s Co
rpu
s, a
nd
R
e
dd
it
F
i
n
a
n
c
e
Q
A
[
37
]
.
T
h
e t
as
k
-s
p
e
c
ifi
c
d
a
t
as
et
s
f
o
r
fi
n
e
-
t
u
n
i
n
g
d
e
p
e
n
d
o
n
t
h
e i
n
te
n
d
e
d
u
s
e
s
u
ch
as
fi
nanc
i
a
l
q
u
e
stio
n
a
n
s
w
e
r
i
n
g
, fi
n
a
n
cial s
e
n
t
e
n
c
e
bo
und
a
r
y
d
e
t
e
ctio
n
, a
nd
fi
n
a
n
cial s
e
n
ti
m
e
n
t a
n
alysis.
T
h
e
d
ata
s
et
s
us
e
d
w
e
r
e Fi
Q
A Ta
s
k 1 a
n
d
Fi
Q
A Ta
s
k2
[
38
]
, Fi
n
a
n
cial
P
hr
a
s
e
B
a
n
k
[
31
]
, a
n
d
t
h
e Fi
n
S
B
D
S
ha
r
e
d
T
as
k
d
a
t
as
et
[
39
]
. T
h
e
r
e
s
u
lt
s
o
f
t
h
ei
r
ex
p
e
r
i
m
e
n
t
s
sh
o
w
e
d
t
ha
t t
h
ei
r
Fi
nB
ERT o
u
t
p
e
rf
o
rms
all
pr
evio
us
s
tate
-
o
f-
t
h
e
-
a
r
t
m
o
d
el
s
i
n
fi
n
a
n
cial q
u
e
s
tio
n-
a
ns
w
e
r
i
n
g
a
pp
licatio
n
s, fi
n
a
n
cial s
e
n
t
e
n
c
e
bo
und
a
r
y
d
e
t
e
ctio
n
, a
nd
fi
n
a
n
cial s
e
n
ti
m
e
n
t a
n
alysis, a
g
ai
n
s
upp
o
r
ti
n
g
t
h
e
vi
e
w
t
h
at
d
o
m
ai
n-
s
p
e
cific
m
o
d
e
ls
h
av
e
h
i
g
h
e
r
p
e
rf
o
rm
a
n
c
e
. Ya
n
g
e
t al.
[
40
]
d
evelo
p
e
d
t
h
ei
r
o
w
n
ve
r
s
io
n
o
f
a Fi
nB
ERT
m
o
d
el
u
s
i
n
g
a
s
i
m
ila
r
a
ppr
oa
ch
to t
h
e ot
h
e
r
s
.
T
h
ey
s
t
a
r
te
d
b
y
c
o
m
p
ili
n
g
a
l
a
r
ge fi
nanc
i
a
l
d
o
ma
i
n
c
o
rpu
s
u
s
i
n
g
c
o
rp
o
r
a
te 10
-
K
an
d
10
-
Q
r
e
p
o
r
t
s
,
ana
ly
s
t
r
e
p
o
r
t
s
,
an
d
e
a
r
n
i
n
g
s
c
o
nf
e
r
e
nc
e
ca
ll t
r
ansc
r
i
p
t
s
. T
h
e
c
o
rpu
s
i
s
t
h
e
n
u
s
e
d
to
c
o
ns
t
ru
c
t
a
fi
nanc
i
a
l vo
cab
u
l
a
r
y
(
Fi
n
Vo
cab)
f
o
r
pr
et
r
a
i
n
i
n
g
B
ERT. T
h
e Fi
nanc
i
a
l
Phr
a
s
e
B
a
n
k
,
F
i
Q
A Ta
s
k
1
[
38
]
, a
nd
A
n
aly
s
tTo
n
e
d
ata
s
e
t
s
[
41
]
a
r
e
t
h
e
n
us
e
d
to fi
n
e
-
t
un
e
t
h
e
pr
e
t
r
ai
n
e
d
m
o
d
e
l,
r
e
s
u
lti
n
g
i
n
t
hr
ee
d
i
ff
e
r
e
n
t v
e
r
sio
n
s o
f
F
i
nB
ERT. T
h
e
e
x
p
e
r
i
m
e
n
tal
r
e
s
u
lts
sh
o
w
e
d
t
h
at t
h
ei
r
Fi
nB
ERT
m
o
d
el
s
h
ave
h
i
g
h
e
r
p
e
rf
o
rm
a
nc
e
c
o
m
p
a
r
e
d
w
it
h
t
h
e
g
e
n
e
r
i
c
B
ERT
m
o
d
el
s
.
W
ilk
s
c
h
a
n
d
Ab
r
a
m
ova
[
42
]
d
evelo
p
e
d
a
m
o
d
el calle
d
P
yFi
n-S
e
n
ti
m
e
n
t
f
o
r
s
o
c
i
a
l
m
e
d
i
a
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
i
n
t
h
e fi
nanc
i
a
l
d
o
ma
i
n
. T
h
ei
r
m
o
d
el
w
as
b
e
nchma
r
ke
d
a
g
ai
n
st
F
i
nB
ERT
[
28
]
, VADER
[
43
]
, T
w
itt
e
r
Ro
B
ERTa
[
27
]
, a
nd
NT
U
S
D
-
F
i
n
[
44
]
;
t
h
e
i
r
m
o
d
e
l
o
u
t
p
e
r
f
o
r
ms
t
h
e
s
e
m
o
d
el
s
o
n
fi
nanc
i
a
l
s
o
c
i
a
l
m
e
d
i
a
s
e
n
ti
m
e
n
t
c
l
ass
ifi
ca
tio
n
t
as
k
r
e
s
u
lt
s
o
b
t
a
i
n
e
d
u
s
i
n
g t
h
e Fi
n
S
o
M
e
an
d
t
h
ei
r
o
w
n
fi
nanc
i
a
l
d
a
t
as
et. T
h
ei
r
d
a
t
as
et
w
as
c
r
e
a
te
d
by collecti
n
g
3,757,384 fi
n
a
n
cial t
w
eet
s
o
n
S&
P
500 ticke
rs
fr
o
m
T
w
itte
r
fr
o
m
1 A
pr
il 2021
a
nd
1
M
ay 2022 t
h
at
m
e
t c
e
r
tai
n
c
r
it
e
r
ia, t
h
e
n
filt
e
r
i
n
g
a
nd
a
nn
otati
n
g
t
h
e
m
to
e
nd
up
w
it
h
2,755,824 t
w
eet
s
. Alt
h
o
ug
h
B
ERT a
pp
ea
rs
to
h
ave bee
n
w
i
d
ely e
xp
e
r
i
m
e
n
te
d
w
it
h
, it
h
a
s
s
o
m
e
sh
o
r
tco
m
i
n
g
s
su
c
h
a
s
h
i
g
h
co
mpu
ti
n
g
d
e
m
a
nds
n
ee
d
i
n
g
a
G
P
U
/T
P
U
, la
r
g
e
m
e
m
o
r
y
n
ee
d
s
, a
n
d
lo
n
g
t
r
ai
n
i
n
g
ti
m
e
s
. T
h
e
r
e
h
ave bee
n
s
eve
r
al e
ff
o
r
t
s
by
d
i
ff
e
r
e
n
t
r
e
s
ea
r
c
h
e
rs
to
d
evelo
p
sma
lle
r
an
d
/o
r
o
p
ti
m
ize
d
ve
r
s
io
ns
o
f
B
ERT. T
h
e XLNet
m
o
d
el
a
ddr
e
ss
e
s
t
h
e
s
e
d
i
s
a
d
va
n
ta
g
e
s
o
f
B
ERT by i
mpr
ovi
n
g
it
s
a
r
c
h
it
e
ct
ur
al
d
e
s
i
g
n
f
o
r
pr
e
t
r
ai
n
i
n
g
a
nd
pr
o
du
c
e
s
r
e
su
lt
s
t
h
at o
u
t
p
e
rf
o
rm
B
ERT o
n
20
d
i
ff
e
r
e
n
t ta
s
k
s
[
45
]
. La
n
et al.
[
46
]
d
evelo
p
e
d
AL
B
ERT
as
a
sma
lle
r
an
d
sca
l
ab
le
s
u
cc
e
ss
o
r
o
f
B
ERT t
ha
t o
u
t
p
e
r
f
o
r
ms
B
ERT o
n
s
eve
r
a
l t
as
k
s
,
i
n
cl
u
d
i
n
g
te
x
t cla
ss
ificatio
n
, a
n
d
r
e
d
u
ce
s
t
h
e
num
be
r
o
f
p
a
r
a
m
ete
rs
r
e
qu
i
r
e
d
i
n
s
e
n
ti
m
e
n
t
a
n
alysis co
mp
a
r
e
d
to
B
ERT.
S
a
nh
e
t al.
[
47
]
d
e
v
e
lo
p
e
d
Distil
B
ERT bas
e
d
o
n
a
m
e
t
h
o
d
olo
g
y
t
ha
t
r
e
du
c
e
s
t
h
e
s
ize o
f
a
B
ERT
m
o
d
el
b
y 40
%
, i
s
60
%
fas
te
r
,
an
d
r
et
a
i
n
e
d
97
%
o
f
it
s
l
an
g
u
a
ge
u
n
d
e
r
s
t
an
d
i
n
g
ca
p
ab
ilitie
s
. F
ac
e
b
ook
has
a
l
s
o
d
evelo
p
e
d
a
n
ovel t
r
ansf
o
r
m
e
r
calle
d
B
ART
[
48
]
w
it
h
a
n
a
r
c
h
itect
ur
e
s
i
m
ila
r
to
GP
T2 a
n
d
B
ERT, a
n
d
it o
u
t
p
e
rf
o
rms
ot
h
e
r
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
5 o
f
21
t
r
a
nsf
o
rm
e
rs
i
n
g
e
n
e
r
atio
n
ta
s
k
s
su
c
h
a
s
t
e
x
t
summ
a
r
izi
n
g
a
nd
q
u
e
s
tio
n
a
ns
w
e
r
i
n
g
. T
h
e
s
e
m
o
d
el
s
h
ave yet to be a
pp
lie
d
f
o
r
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
i
n
t
h
e fi
n
a
n
cial
d
o
m
ai
n
.
M
i
sh
ev et al.
[
49
]
d
e
s
i
g
n
e
d
a
n
eval
u
atio
n
p
lat
f
o
rm
f
o
r
a
ss
e
ss
i
n
g
t
h
e
p
e
rf
o
rm
a
nc
e o
f
m
ac
h
i
n
e lea
rn
i
n
g
cla
ss
ifie
rs
,
d
ee
p
lea
rn
i
n
g
cla
ss
ifie
rs
, a
n
d
fi
n
e
-
t
u
n
i
n
g
NL
P
t
r
a
nsf
o
rm
e
rs
.
T
h
e
p
lat
f
o
rm
i
n
cl
u
d
e
s
ca
p
abilitie
s
f
o
r
pr
e
pr
oce
ss
i
n
g
, te
x
t
f
eat
ur
e e
x
t
r
actio
n
, a
n
d
e
n
co
d
i
n
g
f
o
r
fi
nanc
i
a
l
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
. T
h
ei
r
r
e
s
u
lt
s
sh
o
w
t
ha
t F
ac
e
b
ook
’s
B
ART o
u
t
p
e
r
f
o
r
ms
B
ERT
-
ba
s
e
d
m
o
d
e
l
s
su
c
h
a
s
B
ERT, Ro
B
ERTa, AL
B
ERT, a
nd
Di
s
til
B
ERT, a
s
w
e
ll a
s
XLN
e
t i
n
fi
nanc
i
a
l
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
. U
n
like
M
i
sh
ev et
a
l., i
n
t
h
i
s
w
o
r
k,
w
e
f
o
c
u
s
ex
c
l
u
s
ively o
n
s
ocial
m
e
d
ia te
x
t
s
i
n
o
ur
fi
n
a
n
cial
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
.
3.
M
ate
r
ial
s
a
nd
M
et
h
o
d
s
3.1. D
a
t
a
sets
a
n
d
D
a
t
a
P
r
e
pa
r
a
ti
o
n
T
h
e Fi
nS
o
S
e
n
t
m
o
d
el i
s
a
d
o
m
ai
n-sp
ecific
s
e
n
ti
m
e
n
t a
n
alyze
r
. I
n
t
h
i
s
s
t
u
d
y,
w
e
us
e
d
s
eve
n
fi
nanc
i
a
l
d
o
ma
i
n-s
p
e
c
ifi
c
d
a
t
as
et
s
to
pr
et
r
a
i
n
, fi
n
e
-
t
u
n
e,
an
d
te
s
t t
h
e
m
o
d
el. T
h
e
d
a
t
as
et
s
c
o
ns
i
s
t o
f
T
w
itte
r
an
d
S
to
c
kT
w
it
s
p
o
s
t
s
t
ha
t
a
r
e
r
el
a
te
d
to t
h
e fi
nanc
i
a
l
ma
r
ket
s
.
B
elo
w
,
w
e
d
e
s
c
r
ibe t
h
e
s
e
d
ata
s
et
s
a
n
d
t
h
ei
r
a
pp
licatio
n
to t
h
e
m
o
d
el
-
b
u
il
d
i
n
g
pr
oce
ss
.
3.1.1.
P
r
et
r
ai
n
i
n
g
Data
s
et
Pr
e
t
r
ai
n
i
n
g
pr
ovi
d
e
s
a
n
o
pp
o
r
t
un
ity to
pr
ovi
d
e
m
o
r
e
i
nf
o
rm
atio
n
to t
h
e
m
o
d
e
l b
e
f
o
r
e
fi
n
e
-
t
u
n
i
n
g it o
n
d
o
w
ns
t
r
e
am
t
as
k
s
. A
d
o
ma
i
n-s
p
e
c
ifi
c
s
et o
f
d
o
c
u
m
e
n
t
s
w
as
ext
r
ac
te
d
fr
o
m
a
w
ell
-
k
n
o
w
n
d
ata
s
et, t
h
e T
h
o
ms
o
n
Re
u
te
rs
Te
x
t Re
s
ea
r
c
h
Collectio
n
(
TRC2
)
co
rpus
,
w
h
i
ch
c
o
m
pr
i
s
e
s
1,800,370
n
e
w
s
s
to
r
ie
s
c
ove
r
i
n
g t
h
e
p
e
r
io
d
f
r
o
m
1
Jan
u
a
r
y 2008,
u
n
til
28
F
e
b
ru
a
r
y 2009
[
30
]
.
W
e
us
e
d
a c
us
to
m
ba
g
-
o
f-
fi
n
a
n
cial
-
t
e
rms
cla
ss
ifi
e
r
to
e
x
t
r
act 290,444
fi
n
a
n
cial
n
e
ws
a
r
ticl
e
s
fr
o
m
t
h
e
TRC2 co
rpus
. T
h
i
s
d
o
m
ai
n-sp
e
cific
d
ata
s
e
t,
wh
ic
h
w
e
r
e
f
e
r
to a
s
FinTRC2,
w
a
s
us
e
d
to
pr
et
r
ai
n
o
ur
m
o
d
el.
A
s
i
m
il
a
r
c
l
ass
ifi
ca
tio
n
o
f
d
o
c
u
m
e
n
t
s
i
n
t
h
e TRC2
d
a
t
as
et
b
y A
r
ac
i
[
28
]
ge
n
e
r
a
te
d
46,143 fi
n
a
n
cial
d
oc
u
m
e
n
t
s
;
h
o
w
eve
r
, i
n
t
h
at
w
o
r
k,
n
o
d
etail
s
o
f
t
h
e cla
ss
ificatio
n
pr
oce
ss
w
e
r
e
pr
ovi
d
e
d
ot
h
e
r
t
han
t
ha
t it
w
as
key
w
o
rd
-bas
e
d
. I
n
t
h
i
s
w
o
r
k,
w
e
c
r
e
a
te
d
a
c
u
s
to
m
d
ictio
n
a
r
y
w
it
h
98 fi
n
a
n
cial t
e
rm
s. As s
h
o
wn
i
n
F
i
g
ur
e
1,
e
ac
h
d
oc
um
e
n
t i
n
t
h
e
TRC2 co
rpu
s
w
a
s
pr
e
pr
o
c
e
ss
e
d
a
s
f
ollo
w
s
: fi
r
s
t, t
h
e
d
o
c
u
m
e
n
t
w
a
s
p
a
r
s
e
d
to i
d
e
n
ti
f
y t
h
e lexe
m
e ty
p
e
o
f
e
ach
toke
n
,
s
u
ch
as
,
f
o
r
ex
am
p
le,
d
igit
s
,
w
o
rd
s
,
c
o
m
p
lex
w
o
rd
s
,
an
d
e
ma
il
a
ddr
e
ss
e
s
,
t
h
e
n
li
n
g
u
i
s
tic
s
ru
l
e
s
w
e
r
e
a
pp
li
e
d
to
n
o
rm
aliz
e
t
h
e
l
e
x
e
m
e
s
to t
h
e
i
r
i
n
fi
n
itiv
e
f
o
rm
;
fi
n
ally,
t
h
e
pr
e
pr
o
c
e
ss
e
d
d
o
c
u
m
e
n
t
w
as
s
to
r
e
d
as
a
s
o
r
te
d
a
rr
a
y o
f
lexe
m
e
s
[
50
]
. E
ach
te
r
m
i
n
t
h
e
c
u
s
to
m
d
i
c
tio
na
r
y
w
as
p
a
r
s
e
d
to
a
lexe
m
e.
W
e
s
e
a
r
ch
e
d
t
h
e
pr
e
pr
o
c
e
ss
e
d
d
o
c
u
m
e
n
t
le
x
e
m
e
s
f
o
r
eac
h
d
ictio
n
a
r
y le
x
e
m
e. I
f
a
p
r
e
p
r
oce
ss
e
d
d
oc
u
m
e
n
t
h
a
d
at lea
s
t 3
d
ictio
n
a
r
y
lexe
m
e
s
,
w
e
c
l
ass
ifie
d
t
h
e
c
o
rr
e
s
p
o
n
d
i
n
g TRC2
d
o
c
u
m
e
n
t
as
a
fi
nanc
i
a
l
d
o
c
u
m
e
n
t.
W
e
u
s
e
d
t
h
e
f
u
ll
-
text
s
ea
rch
(
FT
S
)
f
eat
u
r
e i
n
P
o
s
t
g
r
e
SQ
L to
p
r
e
p
r
o
c
e
ss
ea
ch
d
o
c
u
m
e
n
t i
n
t
h
e
TRC2
c
o
rpu
s
an
d
s
to
r
e
d
it
as
a
t
s
ve
c
to
r
.
W
e t
h
e
n
p
e
r
f
o
r
m
e
d
t
h
e
s
e
a
r
ch
o
f
t
h
e
d
i
c
tio
na
r
y
le
x
e
m
e
s
o
n
t
h
e t
s
vecto
rs
us
i
n
g
t
squ
e
r
y a
n
d
t
h
e ot
h
e
r
FT
S
o
p
e
r
ato
rs
.
Fi
gu
r
e 1. Ge
n
e
r
ati
n
g
t
h
e Fi
n
TRC2
d
ata
s
et.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
6 o
f
21
3.1.2. Fi
n
e
-
T
un
i
n
g
Data
s
et
s
S
o
c
i
a
l
S
ent
i
ment In
d
i
c
es
p
o
we
r
e
d
b
y
X
-
S
c
o
r
es
(
SS
IX
)
:
T
h
e
F
i
nS
o
S
e
n
t
m
o
d
e
l
w
as fi
n
e
-
t
un
e
d
o
n
t
h
e
SS
IX
d
ata
s
et. T
h
i
s
d
ata
s
et co
ns
i
s
t
s
o
f
2886 fi
n
a
n
cial
m
e
ss
a
g
e
s
fr
o
m
S
tockT
w
it
s
a
n
d
T
w
itte
r
w
it
h
o
p
i
n
io
n
t
a
r
get
s
[
51
]
. T
h
e
p
e
r
io
d
o
f
c
olle
c
tio
n
f
o
r
t
h
i
s
d
a
t
as
et
w
as
b
et
w
ee
n
O
ctob
e
r
2011 a
nd
J
un
e
2015. T
h
is
d
atas
e
t
w
as a
nn
otat
e
d
by fi
n
a
n
cial
e
x
p
e
r
ts
u
si
n
g
a scal
e
o
f
1
to 7
f
o
r
n
e
g
ativ
e
to
p
o
s
itiv
e
s
e
n
ti
m
e
n
t at a
n
e
n
tity l
e
v
e
l. T
h
i
s
i
n
t
e
g
e
r
s
cal
e
w
a
s
e
v
e
n
t
u
ally
co
ns
oli
d
ate
d
i
n
to a
r
eal
n
u
m
be
r
s
e
n
ti
m
e
n
t
s
co
r
e i
n
t
h
e
[
−1, 1
]
r
a
n
g
e
f
o
r
eac
h
m
e
ss
a
g
e. I
n
o
ur
class o
f
lab
e
l
e
d
d
atas
e
ts, t
h
e
SS
IX
d
atas
e
t
pr
ovi
d
e
d
t
h
e
b
e
st s
e
n
ti
m
e
n
t class
d
ist
r
ib
u
tio
n
a
t 23
%
f
o
r
n
eg
a
tive, 34
%
f
o
r
n
e
u
t
r
a
l,
an
d
44
%
f
o
r
p
o
s
itive,
as
sh
o
w
n
i
n
Fig
ur
e 2,
s
o
w
e
c
h
o
s
e to
us
e it to t
r
ai
n
o
ur
m
o
d
el.
Fi
n-
S
o
M
e
(
FS
M)
: T
h
e F
S
M
d
a
t
as
et
c
o
ns
i
s
t
s
o
f
10,000
m
e
ssa
ge
s
f
r
o
m
S
to
c
kT
w
it
s
[
52
]
.
C
h
e
n
et
a
l.
d
i
d
n
ot
pr
ovi
d
e
a
ti
m
e
p
e
r
io
d
f
o
r
w
h
e
n
t
h
e
s
e
S
to
c
kT
w
it
s
m
e
ssa
ge
s
w
e
r
e
c
olle
c
te
d
. T
h
e F
S
M
d
a
t
as
et i
s
a
gol
d
s
t
an
d
a
rd
t
ha
t
w
as
ann
ot
a
te
d
b
y ex
p
e
r
t
s
w
o
r
ki
n
g i
n
a
ban
k
’s
t
r
e
as
ur
y
ma
r
keti
n
g
an
d
r
i
s
k
mana
ge
m
e
n
t
u
n
it
s
. T
h
e
ma
r
ket
s
e
n
ti
m
e
n
t o
f
e
ach
m
e
ss
a
g
e i
n
t
h
e F
SM
d
ata
s
et
w
a
s
labele
d
a
s
eit
h
e
r
b
u
lli
sh
, bea
r
i
sh
, o
r
n
eit
h
e
r
.
S
emE
v
a
l
-
2017 T
a
sk 5
(
S
ET5
)
:
T
h
i
s
d
ata
s
e
t
w
a
s
p
o
s
t
e
d
6
S
e
p
t
e
m
b
e
r
2016
f
o
r
t
h
e
Su
bta
s
k 1
o
f
t
h
e
S
e
m
Ev
a
l
-
2017 T
as
k 5 Fi
n
e
-
G
r
a
i
n
e
d
S
e
n
ti
m
e
n
t A
na
ly
s
i
s
o
n
Fi
nanc
i
a
l
M
i
c
r
o
b
log
s
a
n
d
Ne
w
s
d
ata. It co
ns
i
s
t
s
o
f
1285
S
tockT
w
it
s
a
n
d
T
w
itte
r
m
e
ss
a
g
e
s
co
n
tai
n
i
n
g
a ca
sh
ta
g
,
w
h
i
ch
i
s
a
c
o
m
p
an
y
s
to
c
k
s
y
mb
ol
pr
e
c
e
d
e
d
b
y
a
“$”
[
53
]
. E
ach
m
e
ssa
ge
has
a
s
e
n
ti
m
e
n
t
s
co
r
e bet
w
ee
n
−1 a
n
d
1.
Fi
gu
r
e 2. Di
s
t
r
ib
u
tio
n
o
f
s
e
n
ti
m
e
n
t cla
ss
e
s
f
o
r
all
d
ata
s
et
s
.
3.1.3. Te
s
ti
n
g
Data
s
et
s
Fo
r
te
s
ti
n
g t
h
e Fi
n
S
o
S
e
n
t
m
o
d
el,
as
w
ell
as
t
h
e ot
h
e
r
five
c
o
m
p
a
r
a
tive
s
e
n
ti
m
e
n
t
a
n
alyze
rs
,
w
e
us
e
d
t
h
e
f
ollo
w
i
n
g
f
o
ur
d
ata
s
et
s
:
Fi
n-
Li
n
(
FL_ST
)
:
T
h
e
p
a
r
e
n
t
d
ata
s
et
(
Fi
n-
Li
n)
co
ns
i
s
t
s
o
f
3811
d
oc
u
m
e
n
t
s
, co
mpr
i
s
e
d
o
f
m
i
c
r
o
b
log
s
f
r
o
m
S
to
c
kT
w
it
s
,
n
e
w
s
a
r
ti
c
le
s
f
r
o
m
Y
ah
oo
!
Ne
w
s
, fi
nanc
i
a
l
r
e
p
o
r
t
s
f
o
r
pu
blicly t
r
a
d
e
d
co
mp
a
n
ie
s
, a
nd
a
n
aly
s
t
r
e
p
o
r
t
s
fr
o
m
1 J
u
ly 2018 to 30
S
e
p
te
m
be
r
2018
[
54
]
.
W
e ext
r
ac
te
d
o
n
ly t
h
e
S
to
c
kT
w
it
s
d
a
t
a
f
r
o
m
Fi
n-
Li
n
to
c
r
e
a
te t
h
e FL_
S
T
d
a
t
as
et,
w
h
i
ch
c
o
ns
i
s
te
d
o
f
3204
s
to
c
kt
w
it
s
/
m
e
ssa
ge
s
. E
ach
m
e
ssa
ge i
n
t
h
e FL_
S
T
d
a
t
as
et
w
as
l
ab
ele
d
w
it
h
a
num
e
r
ic
s
e
n
ti
m
e
n
t
s
co
r
e i
n
t
h
e
[
−1, 1
]
r
a
n
g
e.
S
a
n
de
r
s
:
T
h
e
S
a
n
d
e
r
s
d
ata
s
et
c
o
ns
i
s
t
s
o
f
5512 t
w
eet
s
o
n
f
o
ur
d
i
ff
e
r
e
n
t to
p
i
cs
(
A
pp
le,
Goo
g
le,
M
ic
r
o
s
o
f
t, a
n
d
T
w
itte
r)
. T
h
i
s
d
ata
s
et i
s
a
g
ol
d
s
ta
n
d
a
r
d
w
it
h
eac
h
t
w
eet
m
a
nu
ally
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
7 o
f
21
labele
d
by o
n
e a
nn
otato
r
a
s
eit
h
e
r
p
o
s
itive,
n
e
g
ative,
n
e
u
t
r
al, o
r
i
rr
eleva
n
t
w
it
h
r
e
sp
ect to
t
h
e to
p
ic
[
55
]
. T
h
e
s
e t
w
eet
s
w
e
r
e collecte
d
bet
w
ee
n
2007 a
n
d
2011.
T
ab
o
r
d
a
-
L
:
T
h
e Tabo
r
d
a labele
d
d
ata
s
et co
ns
i
s
t
s
o
f
1300 t
w
eet
s
,
w
h
ic
h
w
e
r
e collecte
d
b
et
w
ee
n
9 A
pr
il 2020
an
d
16
J
u
ly 2020,
u
s
i
n
g t
h
e
f
ollo
w
i
n
g T
w
itte
r
t
a
g
s
as
t
h
e
s
e
a
r
ch
p
a
r
a
m
e
t
e
r
s
:
#
SP
X500, #
SP
500,
SP
X500,
SP
500, $
SP
X, #stoc
k
s, $
MS
F
T, $AA
P
L, $A
M
ZN, $
F
B
,
$
BB
RK.
B
, $
GOOG
, $JNJ, $J
PM
, $V, $
P
G
, $
M
A, $INTC $
U
NH, $
B
AC, $T, $HD, $X
O
M
, $DI
S
,
$VZ, $K
O
, $
M
RK, $C
M
C
S
A, $CVX, $
P
E
P
, a
nd
$
PF
E. T
h
e t
w
eet
s
w
e
r
e
m
a
nu
ally a
nn
otate
d
w
it
h
p
o
s
itive,
n
e
u
t
r
al, o
r
n
e
g
ative
s
e
n
ti
m
e
n
t cla
ss
e
s
[
56
]
.
T
h
e
summ
a
r
y
d
ata o
f
t
h
e te
s
ti
n
g
d
ata
s
et
s
i
s
sh
o
w
n
i
n
Table 1.
T
a
b
le 1.
Su
mma
r
y
s
t
a
ti
s
ti
cs
o
f
a
ll t
h
e
d
a
t
as
et
s
. Avg. FT” i
s
t
h
e
a
ve
r
a
ge o
f
t
h
e t
h
r
ee fi
n
e
-
t
u
n
i
n
g
d
a
t
as
et
s
,
an
d
Avg. Te
s
t” i
s
t
h
e
a
ve
r
a
ge o
f
t
h
e t
h
r
ee te
s
ti
n
g
d
a
t
as
et
s
. T
h
e Avg.
d
o
c
le
n
gt
h
i
s
t
h
e
ave
r
a
g
e
d
oc
um
e
n
t le
n
g
t
h
i
n
c
h
a
r
acte
rs
.
A
n
aly
s
i
s
Av
g
. FTFS
M
SET5SSIX
Av
g
. Te
s
t
Fi
n
-
Li
n
Ta
b
o
r
d
a
Sa
nd
e
rs
U
n
i
qu
e
d
oc
um
e
n
t
s
Av
g
d
oc le
n
g
t
h
P
o
s
itive
(
#
)
Ne
u
t
r
al
(
#
)
Ne
g
ative
(
#
)
P
o
s
itive
(
%
)
Ne
u
t
r
al
(
%
)
Ne
g
ative
(
%
)
Toke
n
co
un
t
m
ea
n
Toke
n
co
un
t
m
e
d
ia
n
W
o
r
d
co
un
t
m
ea
n
4203 9885 1133 1591
96 118 80 89
3005 7377 655 984
903 1805 375 528
295 703 103 79
65 75 58 62
28 18 33 33
7794
21 27 17 20 20
27 14 19 16 20
13 15
2452
119
706
1172
573
32
44
24
25
24
17
2794
107
1101
874
819
39
31
29
23
22
15
1284
151
523
420
341
41
33
27
32
29
22
3277
100
494
2223
560
15
68
17
21
21
15
T
h
e
s
e
n
ti
m
e
n
t cla
ss
d
i
s
t
r
ib
u
tio
n
o
f
t
h
e te
s
ti
n
g
d
ata
s
et
s
i
s
sh
o
w
n
i
n
Fi
g
ur
e 2.
3.1.4. Data
P
r
e
pr
oce
ss
i
n
g
T
h
e go
a
l i
s
to
b
e
ab
le to
b
u
il
d
a
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
m
o
d
el t
ha
t i
s
ab
le to ge
n
e
r
a
lize
w
ell ac
r
o
ss
m
a
n
y te
x
t
s
;
i
n
o
r
d
e
r
to
d
o
s
o,
w
e e
x
p
e
r
i
m
e
n
te
d
o
n
t
h
e
s
i
x
d
ata
s
et
s
m
e
n
tio
n
e
d
a
b
ove
f
o
r
t
h
e fi
n
e
-
t
u
n
i
n
g
a
n
d
te
s
ti
n
g
o
f
t
h
e
m
o
d
el
s
. T
h
e
d
ata
s
et
s
r
a
n
g
e
fr
o
m
u
nb
ala
n
ce
d
d
ata
s
et
s
to
m
o
r
e
p
r
o
p
e
r
ly
d
i
s
t
r
i
b
u
te
d
s
et
s
, i
n
t
h
e ti
m
e
p
e
r
io
d
s
o
f
t
h
e
d
ata
r
e
c
o
r
d
i
n
g
, a
n
d
fi
n
ally, t
h
e
i
r
s
iz
e
.
M
i
n
o
r
pr
oc
e
ss
i
n
g
o
f
t
h
e
d
ata
w
a
s
n
ee
d
e
d
i
n
o
rd
e
r
to
h
av
e
a cla
ss
ificatio
n
r
e
pr
e
s
e
n
t
e
d
by a
n
i
n
t
e
g
e
r
o
f
[
0, 1, 2
]
;
t
h
is
r
e
q
u
i
r
e
d
co
n
v
e
r
ti
n
g
t
e
x
t
-
bas
e
d
classificatio
n
lab
e
ls
a
n
d
floati
n
g
-p
oi
n
t
r
e
pr
e
s
e
n
tatio
ns
i
n
to t
h
e i
n
te
g
e
r
f
o
rm
m
e
n
tio
n
e
d
above.
B
e
f
o
r
e t
r
ai
n
i
n
g a
n
d
te
s
ti
n
g t
h
e
m
o
d
el
s
o
n
t
h
e
d
ata
s
et
s
, it i
s
i
m
p
o
r
ta
n
t to
h
a
r
m
o
n
ize
t
h
e
d
ata
s
et
s
. T
h
i
s
i
nc
l
ud
e
d
u
n
d
e
rs
ta
n
d
i
n
g
w
h
at
m
ake
s
t
h
e
d
ata
s
et
s
d
i
ff
e
r
e
n
t
c
o
nc
e
rn
i
n
g
t
h
e
s
e
n
ti
m
e
n
t labeli
n
g
tec
hn
i
qu
e a
n
d
t
h
e te
x
t
r
e
pr
e
s
e
n
te
d
w
it
h
i
n
t
h
e
d
ata
s
et
s
.
S
o
c
i
a
l
S
ent
i
ment In
d
i
c
es
p
o
we
r
e
d
b
y
X
-
S
c
o
r
es
(
SS
IX
)
:
F
o
r
t
h
is
d
atas
e
t, it
w
as i
d
e
n
tifi
e
d
t
h
at
t
h
e
r
e
w
e
r
e
1029
dup
licat
e
d
oc
um
e
n
ts, a
nd
o
n
ly 285
d
oc
um
e
n
ts
w
e
r
e
un
iq
u
e
. All
dup
licat
e
s
w
e
r
e
r
e
m
ove
d
,
an
d
t
h
e fi
r
s
t i
ns
t
anc
e o
f
t
h
e
d
o
c
u
m
e
n
t
w
as
ke
p
t.
W
e
a
l
s
o
n
ote
d
t
ha
t t
h
e
dup
li
c
ate
s
h
a
d
d
i
ff
e
r
e
n
t
s
e
n
ti
m
e
n
t ave
r
a
g
e
sc
o
r
e
s
. Fo
r
d
ete
rm
i
n
i
n
g
t
h
e
c
la
ss
ifi
c
atio
n
,
w
e
u
s
e
d
t
h
e
sco
r
e
pr
ovi
d
e
d
a
nd
r
e
-
lab
e
l
e
d
as
f
ollo
w
s
:
n
e
g
ativ
e
[
1.0, −0.1
)
,
n
e
u
t
r
al
[
0.1, 0.1
]
,
a
n
d
p
o
s
itive
(
0.1, 1.0
]
.
Fi
n-
S
o
M
e
(
FS
M)
: T
h
e F
S
M
d
a
t
a
r
e
q
u
i
r
e
d
c
le
an
i
n
g
up
,
an
d
a
ll
dup
li
ca
te
s
t
ha
t
w
e
r
e
i
d
e
n
tifi
e
d
w
e
r
e
fr
o
m
t
h
e
tic
k
e
r
si
g
n
$
. All
dup
licat
e
s
w
e
r
e
r
e
m
ov
e
d
, a
nd
t
h
e
fi
r
st i
n
sta
n
c
e
o
f
t
h
e
d
oc
um
e
n
t
w
a
s
ke
p
t. T
h
e
d
ata
s
et
h
a
d
t
hr
ee cla
ss
ificatio
ns
n
e
g
ative
,
unsur
e
, a
n
d
p
o
s
itive”;
u
ns
ur
e i
n
t
h
i
s
cas
e
w
as
chan
ge
d
to
r
e
pr
e
s
e
n
t
n
e
u
t
r
a
l” i
n
o
rd
e
r
to
n
o
r
ma
lize
o
ur
e
x
p
e
r
i
m
e
n
t
s
.
SemE
v
a
l
-
2017 T
a
sk 5
(
SET5
)
:
S
ET5 e
x
p
e
r
ie
n
ce
d
du
p
licate
d
oc
u
m
e
n
t
s
,
w
h
ic
h
r
e
q
u
i
r
e
d
t
h
e
r
e
m
ov
a
l o
f
152
r
e
c
o
rd
s
,
w
h
e
r
e t
h
e fi
r
s
t i
ns
t
anc
e o
f
t
h
e
d
o
c
u
m
e
n
t
w
as
ke
p
t. It i
s
n
ote
d
t
ha
t t
h
e
dup
li
ca
te v
a
l
u
e
s
ha
d
d
i
ff
e
r
i
n
g
s
e
n
ti
m
e
n
t v
a
l
u
e
s
f
r
o
m
t
h
e fi
r
s
t
r
e
c
o
rd
. Fo
r
d
ete
r
m
i
n
i
n
g t
h
e
c
l
ass
ifi
ca
tio
n
,
w
e
u
s
e
d
t
h
e
sc
o
r
e
pr
ovi
d
e
d
an
d
r
e
-
l
ab
ele
d
as
f
ollo
w
s
:
n
e
g
ative
[
1.0, 0.1
)
,
n
e
u
t
r
al
[
0.1, 0.1
]
, a
n
d
p
o
s
itive
(
0.1, 1.0
]
.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
8 o
f
21
Fi
n-
Li
n
(
FL_ST
)
: Fi
n-
Li
n
ex
p
e
r
ie
nc
e
d
dup
li
ca
te
d
o
c
u
m
e
n
t
s
,
an
d
a
ll
dup
li
ca
te
s
w
e
r
e
r
e
m
ove
d
an
d
t
h
e fi
r
s
t i
ns
t
anc
e o
f
t
h
e
d
o
c
u
m
e
n
t ke
p
t. T
h
e
d
a
t
as
et
a
l
s
o
ha
d
a
ca
tego
r
y
d
e
t
e
rm
i
n
i
n
g
t
h
e
so
ur
c
e
o
f
t
h
e
d
oc
um
e
n
ts
;
t
h
e
d
atas
e
t
w
as t
h
e
n
filt
e
r
e
d
to o
n
ly co
n
tai
n
d
ata
fr
o
m
S
toc
k
T
w
its,
d
e
si
g
n
at
e
d
as
SW
.
F
o
r
d
e
t
e
rm
i
n
i
n
g
t
h
e
classificatio
n
,
w
e
u
s
e
d
t
h
e
sco
r
e
pr
ovi
d
e
d
a
n
d
r
e
-
labele
d
a
s
f
ollo
w
s
:
n
e
g
ative
[
−1.0, −0.1
)
,
n
e
u
t
r
al
[
−0.1, 0.1
]
, a
n
d
p
o
s
itive
(
0.1, 1.0
]
.
S
a
n
d
e
r
s
:
No
dup
licat
e
s
w
e
r
e
obs
e
r
v
e
d
w
it
h
t
h
e
S
a
nd
e
r
s
d
atas
e
t, b
u
t
wh
at
w
as obs
e
r
v
e
d
w
as t
h
at
m
a
n
y
d
oc
um
e
n
ts
w
e
r
e
fr
o
m
a
d
i
ff
e
r
e
n
t la
n
g
u
a
g
e;
t
h
is
w
as
d
e
t
e
rm
i
n
e
d
by
u
si
n
g
t
h
e
la
n
g
d
e
t
e
ct
p
ac
k
a
g
e
,
wh
ic
h
l
e
v
e
r
a
g
e
s a lib
r
a
r
y
fr
o
m
G
oo
g
l
e
s la
n
g
u
a
g
e
-d
e
t
e
ctio
n
.
O
u
t o
f
t
h
e
5113
d
oc
um
e
n
t
s
, all
d
oc
um
e
n
t
s
w
e
r
e
r
e
m
ove
d
t
h
at
w
e
r
e
n
ot labele
d
a
s
E
n
g
li
sh
,
r
e
su
lti
n
g
i
n
t
h
e
r
e
m
oval o
f
1477
d
oc
u
m
e
n
t
s
. Fi
n
ally, t
h
e
d
ata
s
et
h
a
d
f
o
u
r
r
e
m
ai
n
i
n
g
cla
ss
ificatio
ns
:
n
e
g
ative
,
i
rr
eleva
n
t
,
n
e
u
t
r
al
, a
n
d
p
o
s
itive
;
i
rr
eleva
n
t i
n
t
h
i
s
ca
s
e
w
a
s
r
e
m
ove
d
i
n
o
r
d
e
r
to
n
o
rm
alize o
ur
e
x
p
e
r
i
m
e
n
t
s
.
T
ab
o
r
d
a
-
L: T
ab
o
rd
a
ex
p
e
r
ie
nc
e
d
dup
li
ca
te
d
o
c
u
m
e
n
t
s
,
w
h
i
ch
r
e
q
u
i
r
e
d
t
h
e
r
e
m
ov
a
l
o
f
16
r
eco
r
d
s
,
w
h
e
r
e t
h
e fi
r
s
t i
ns
t
anc
e o
f
t
h
e
d
o
c
u
m
e
n
t
w
as
ke
p
t.
W
e
a
l
s
o
n
ote
d
t
ha
t t
h
e
d
up
licate
s
h
a
d
d
i
ff
e
r
e
n
t
s
e
n
ti
m
e
n
t ave
r
a
g
e
s
co
r
e
s
a
n
d
d
i
ff
e
r
e
n
t
c
r
eate
d
_at
val
u
e
s
.
T
h
e
n
u
m
e
r
i
c
al val
u
e
s
ch
o
s
e
n
f
o
r
r
e
pr
e
s
e
n
ti
n
g
floati
n
g
-
p
oi
n
t val
u
e
s
i
n
t
h
e
c
la
ss
ifi
c
a
-
tio
ns
w
e
r
e
s
ele
c
te
d
to
ma
xi
m
ize t
h
e i
d
e
n
tifi
ca
tio
n
o
f
text t
ha
t ex
pr
e
ss
e
d
e
m
otio
ns
; t
h
e
r
a
n
g
e
s
fr
o
m
p
o
s
itive a
n
d
n
e
g
ative ca
p
t
ur
e
d
0.9,
r
e
sp
ectively,
w
h
ile
n
e
u
t
r
al
w
a
s
0.2.
T
h
e Fi
n
S
o
S
e
n
t
m
o
d
el
w
as
b
u
ilt
u
s
i
n
g t
h
e
pr
et
r
a
i
n
e
d
B
ERT
m
o
d
el,
w
h
i
ch
u
s
e
s
t
h
e
t
r
a
nsf
o
rm
e
r
a
r
c
h
itect
ur
e,
w
h
ic
h
p
e
rf
o
rms
w
ell
f
o
r
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
d
u
e to t
h
e ability to
ca
p
t
ur
e
lo
n
g
-r
a
n
g
e
d
e
p
e
nd
e
n
ci
e
s
w
it
h
o
u
t
r
e
lyi
n
g
o
n
s
e
q
u
e
n
tial
pr
oc
e
ss
i
n
g
.
B
ERT co
n
v
e
r
t
s
text e
mb
e
dd
i
n
g
s
, e
s
t
ab
li
sh
i
n
g toke
n
s
i
m
il
a
r
itie
s
an
d
u
n
d
e
r
s
t
an
d
i
n
g t
h
r
o
u
g
h
it
s
e
nc
o
d
e
r
m
e
ch
a
n
i
sm
. A
cc
o
r
d
i
n
g
to Devli
n
et al.
[
17
]
, t
h
e
B
ERT t
r
a
nsf
o
rm
e
r
u
s
e
s
b
i
d
i
r
e
c
tio
n
al
s
el
f-
a
tte
n
tio
n
,
an
d
t
h
i
s
i
s
s
upp
o
r
te
d
b
y t
h
e
pr
et
r
a
i
n
i
n
g t
as
k t
h
ey
u
s
e
d
,
w
h
i
ch
i
nc
l
ud
e
d
mas
k
L
M
a
n
d
n
e
x
t
s
e
n
te
n
ce
pr
e
d
ictio
n
.
T
h
e
b
e
n
e
fit o
f
u
si
n
g
B
ERT
f
o
r
s
e
n
ti
m
e
n
t a
n
alysis is t
h
at
w
e
ca
n
a
dd
a
dd
itio
n
al
d
o
m
ai
n-
r
el
a
te
d
i
nf
o
r
ma
tio
n
as
a
pr
et
r
a
i
n
i
n
g
s
te
p
.
Pr
et
r
a
i
n
i
n
g
a
llo
w
s
u
s
to i
n
iti
a
lize t
h
e
w
eig
h
t
s
o
f
t
h
e
m
o
d
el; i
n
o
ur
cas
e,
w
e
w
o
u
l
d
like to t
r
a
i
n
t
h
e
m
o
d
el
u
s
i
n
g fi
nanc
i
a
l
-
r
el
a
te
d
i
nf
o
r
-
m
atio
n
a
s
toke
ns
. To
u
s
e
pr
et
r
ai
n
i
n
g,
w
e
n
ee
d
e
d
to
pr
o
c
ur
e a
c
o
rpu
s
o
f
fi
n
a
nc
ial
-
r
elate
d
i
nf
o
r
ma
tio
n
,
an
d
w
e
a
r
e
ab
le to leve
r
a
ge t
h
e T
h
o
mas
Re
u
te
r
s
Co
rpu
s
to i
d
e
n
ti
f
y
q
u
a
lity
a
r
ti
c
le
s
an
d
, t
h
u
s
,
d
o
c
u
m
e
n
t
s
t
ha
t
w
e
w
ill
sam
p
le
as
a
pr
et
r
a
i
n
i
n
g
d
a
t
as
et. A
s
sha
r
e
d
b
y Devli
n
et al.
[
17
]
,
d
o
c
u
m
e
n
t
-
level
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
sam
p
le
s
a
r
e
pr
e
f
e
rr
e
d
to ext
r
ac
t
lo
n
g
c
o
n
ti
gu
o
u
s
text, a
n
d
t
h
e T
h
o
m
a
s
Re
u
te
rs
Co
r
pu
s
g
ave
u
s
t
h
i
s
a
b
ility. T
h
e a
pp
r
oa
ch
t
r
e
at
e
d
e
ac
h
a
r
ticl
e
i
n
t
h
e
co
rpus
a
s
a
d
oc
um
e
n
t,
wh
ic
h
w
a
s
t
h
e
n
to
k
e
n
iz
e
d
, a
nd
t
h
e
to
k
e
ns
w
e
r
e ev
a
l
u
a
te
d
f
o
r
fi
nanc
i
a
l
c
o
n
te
n
t. T
h
e
c
r
e
a
tio
n
o
f
t
h
e
pr
et
r
a
i
n
i
n
g
d
a
t
as
et
s
i
nc
l
ud
e
d
t
h
e
r
e
q
u
i
r
e
m
e
n
t o
f
v
a
li
d
a
ti
n
g toke
ns
s
u
ch
t
ha
t t
h
ey
ma
t
ch
e
d
a
li
s
t o
f
t
a
r
gete
d
fi
nanc
i
a
l
toke
ns
. E
ach
d
o
c
u
m
e
n
t
w
as
r
e
q
u
i
r
e
d
to
m
eet
a
t
h
r
e
sh
ol
d
o
f
ma
t
ch
i
n
g toke
ns
,
s
i
m
il
a
r
to
a
ba
g
-
o
f-
w
o
rd
s
a
ppr
o
ach
. I
n
tot
a
l,
w
e
w
e
r
e
ab
le to
c
r
e
a
te
a
d
a
t
as
et o
f
49
m
illio
n
w
o
rd
s
.
T
h
e
d
i
ff
e
r
e
n
t ve
rs
io
ns
o
f
F
i
nB
ERT a
r
e b
r
iefly e
x
p
lai
n
e
d
i
n
S
ectio
n
2, a
nd
w
e co
mp
a
r
e
d
o
ur
a
ppr
o
ach
to t
w
o o
f
t
h
o
s
e Fi
nB
ERT
m
o
d
el
s
, t
h
e Fi
nB
ERT
m
o
d
el
b
y Li
u
et
a
l.
[
33
]
an
d
t
h
e
F
i
nB
ERT
m
o
d
e
l by Ya
n
g
e
t al.
[
40
]
. Li
u
e
t al. took t
h
e
a
ppr
oac
h
o
f
us
i
n
g
F
i
n
a
n
cial
W
e
b a
nd
Ya
h
oo
F
i
n
a
n
ce
d
ata
w
it
h
a c
r
a
w
le
r
a
nd
Re
dd
it
F
i
n
a
n
ce
Q
A
f
o
r
Re
dd
it
p
o
s
t
s
w
it
h
m
o
r
e t
h
a
n
4
up
vote
s
i
n
tot
a
l,
c
r
e
a
ti
n
g
a
pr
et
r
a
i
n
i
n
g
d
a
t
as
et o
f
12.71
b
illio
n
w
o
rd
s
. Y
an
g et
a
l.
u
s
e
d
S
EC
ann
u
a
l fili
n
g
r
e
p
o
r
t
s
f
r
o
m
10
-
K
an
d
10
-
Q
d
o
c
u
m
e
n
t
s
, e
a
r
n
i
n
g
s
ca
ll t
r
ansc
r
i
p
t
s
,
an
d
a
n
aly
s
t
r
e
p
o
r
t
s
, totali
n
g
4.9 billio
n
w
o
r
d
s
.
T
h
e
d
a
t
as
et
s
u
s
e
d
f
o
r
fi
n
e
-
t
u
n
i
n
g
r
e
ma
i
n
e
d
i
n
t
h
ei
r
r
a
w
s
t
a
te i
nc
l
ud
i
n
g
m
i
ss
p
elle
d
w
o
r
d
s
, e
m
oji
s
, e
m
oti
c
o
ns
, ja
r
g
o
n
,
n
u
mb
e
rs
, a
n
d
c
o
nn
otatio
ns
. T
h
e
d
e
c
i
s
io
n
w
a
s
m
a
d
e to
u
s
e
t
h
e
p
o
w
e
r
o
f
e
m
b
e
dd
i
n
g
s to c
r
e
at
e
t
h
e
r
e
latio
n
s
h
i
p
s o
f
t
h
e
to
k
e
n
s
w
it
h
i
n
t
h
e
g
iv
e
n
d
oc
-
um
e
n
t. A
s
w
e e
x
p
lo
r
e
d
pr
e
pr
oce
ss
i
n
g
t
h
e te
x
t,
w
e
n
otice
d
t
h
at t
h
e o
r
i
g
i
n
al
und
e
rs
ta
nd
i
n
g
o
f
t
h
e
d
oc
u
m
e
n
t
w
a
s
lo
s
t a
n
d
w
a
s
n
ea
r
ly ille
g
ible.
P
r
e
p
r
oce
ss
i
n
g
co
m
p
le
x
te
x
t like
s
ocial
m
e
d
ia ja
rg
o
n
o
r
d
o
m
ai
n-s
p
e
c
ifi
c
i
nf
o
r
m
atio
n
i
n
fi
n
a
nc
e
m
ay
r
e
s
u
lt i
n
lo
s
i
n
g
it
s
m
ea
n
i
n
g
a
n
d
un
d
e
rs
ta
n
d
i
n
g
. A
n
e
x
a
mp
le o
f
t
h
i
s
i
s
sh
o
w
n
i
n
Fi
g
ur
e 3.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
9 o
f
21
Yo
u
w
ill
n
otice t
h
at
mu
c
h
t
h
e o
r
i
g
i
n
al
m
ea
n
i
n
g
i
s
lo
s
t i
f
n
ot co
mp
letely c
h
a
n
g
e
d
a
f
te
r
f
o
r
e
g
oi
n
g
a
f
e
w
pr
e
pr
o
c
e
ss
i
n
g
s
te
p
s
t
h
at a
r
e o
f
te
n
r
e
c
o
mm
e
n
d
e
d
. I
n
a
dd
itio
n
, t
h
e
r
e
w
a
s
an
i
nab
ility to
han
d
le
abb
r
evi
a
te
d
w
o
rd
s
like
m
kt
ca
p
”,
w
h
i
ch
r
e
pr
e
s
e
n
t
s
ma
r
ket
ca
p
”,
as
w
ell
as
i
n
t
r
o
du
c
i
n
g
an
i
nc
r
e
as
e i
n
m
i
ss
p
elle
d
w
o
rd
s
like
m
i
n
u
te
s
to
m
i
n
u
t”. A
n
a
lte
r
na
tive
a
ppr
o
ach
to
pr
e
pr
o
c
e
ss
i
n
g
can
b
e to leve
r
a
ge ge
n
e
r
a
tive text l
a
r
ge l
an
g
u
a
ge
m
o
d
el
s
o
r
LL
Ms
,
s
o
m
et
h
i
n
g
w
e
w
ill co
n
ti
nu
e to e
x
p
e
r
i
m
e
n
t
w
it
h
i
n
fu
t
ur
e
w
o
r
k
s
.
A
cc
o
r
d
i
n
g
to
B
alaji et al.
[
57
]
,
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
h
a
s
d
i
ff
e
r
e
n
t
c
la
ss
ifi
c
atio
ns
,
w
h
i
ch
d
i
ff
e
r
i
n
gr
a
n
u
la
r
ity,
f
r
o
m
t
h
e
d
o
c
u
m
e
n
t level
d
o
w
n
to
m
o
r
e
gr
a
n
u
la
r
c
la
ss
ifi
c
atio
ns
like
t
h
e
phr
a
s
e level. Eac
h
ty
p
e o
f
cla
ss
ificatio
n
h
a
s
it
s
be
n
efit
s
a
n
d
d
r
a
w
back
s
. T
h
e
d
oc
um
e
n
t
level
b
e
n
efit
s
f
r
o
m
ha
vi
n
g
a
s
e
n
ti
m
e
n
t
c
l
ass
ifi
ca
tio
n
f
o
r
p
o
ss
i
b
ly o
n
e o
r
man
y
d
i
ff
e
r
e
n
t
to
p
ic
s
;
t
h
e
i
n
v
e
rs
e
o
f
t
h
i
s
i
s
t
h
at t
h
e
s
e
n
ti
m
e
n
t cla
ss
ificatio
n
g
iv
e
n
ca
n
b
e
m
i
s
l
e
a
d
i
n
g
a
s
t
h
e
d
oc
um
e
n
t a
s
a
w
h
ole
m
ay i
nh
e
r
it
p
o
s
itive,
n
e
u
t
r
al, o
r
n
e
g
ative
p
iece
s
o
f
i
nf
o
rm
atio
n
.
T
h
e
d
a
t
as
et
s
w
e
u
s
e
d
dur
i
n
g t
h
e ex
p
e
r
i
m
e
n
t
a
tio
n
r
an
ge
d
f
r
o
m
an
a
ve
r
a
ge o
f
17 to
32 toke
ns
,
w
h
ile t
h
e te
s
ti
n
g
d
ata
s
et
s
ave
r
a
g
e
d
25 toke
ns
. I
n
a
dd
itio
n
, t
h
e
w
o
rd
r
e
s
t
r
ictio
ns
fr
o
m
T
w
itt
e
r
p
osts
m
a
k
e
it co
mp
licat
e
d
to
h
av
e
t
h
e
ability to co
n
v
e
y
d
i
ff
e
r
e
n
t co
n
t
r
a
d
icti
n
g
s
e
n
ti
m
e
n
t
s
.
W
it
h
i
n
t
h
e
M
o
d
el Develo
pm
e
n
t Re
su
lt
s
S
ectio
n
, e
rr
o
r
a
n
aly
s
i
s
w
a
s
co
n
d
u
cte
d
supp
o
r
ti
n
g
t
h
i
s
clai
m
t
h
at t
h
e
s
e
n
te
n
ce
s
la
r
g
ely a
r
e
n
ot i
n
c
r
e
d
ibly lo
n
g
o
r
co
mp
le
x
, a
s
ca
n
be
s
ee
n
i
n
r
o
w
3 o
f
Table 1.
O
r
i
g
i
n
al te
x
t
:
$
FT
R
H
a
d $775Mil M
k
tC
a
p
a
t close
Y
est. 10%
H
a
s
E
v
a
por
a
ted In 90 minute tod
a
y
.Been w
a
rned
aga
in, but
Know-it-
a
lls
D
re
a
m of B
S
25%
D
i
vy
s.
P
r
e
p
r
oce
ss
i
ng
s
te
p
Co
n
ve
r
t to lo
w
e
r
ca
s
e a
n
d
r
e
m
ove
sp
ecial c
h
a
r
acte
rs
Re
m
ove
s
to
p
w
o
r
d
s
P
e
rf
o
rm
s
te
mm
i
n
g
P
e
rf
o
rm
le
mm
atizatio
n
E
x
p
a
n
d
co
n
t
r
actio
ns
O
u
t
pu
t
ftr h
a
d mil m
k
tc
a
p
a
t close
y
est h
a
s
e
v
a
por
a
ted in minutes tod
a
y
been w
a
rned
aga
in but
k
nowit
a
lls dre
a
m of bs di
vy
s
ftr mil m
k
tc
a
p close
y
est e
v
a
por
a
ted
minutes tod
a
y
w
a
rned
k
nowit
a
lls dre
a
m
bs di
vy
s
ftr mil m
k
tc
a
p close
y
est e
v
a
por minut
tod
a
y
w
a
rn
k
nowit
a
l dre
a
m bs di
v
i
ftr mil m
k
tc
a
p close
y
est e
v
a
por minut
tod
a
y
w
a
rn
k
nowit
a
l dre
a
m b di
v
i
ftr mil m
k
tc
a
p close
y
est e
v
a
por minut
tod
a
y
w
a
rn
k
nowit
a
l dre
a
m bs di
v
i
Fi
gu
r
e 3. A
n
e
x
a
mp
le o
f
pr
e
pr
oce
ss
i
n
g
a
s
ocial
m
e
d
ia
p
o
s
t.
3.2.
M
o
d
el De
v
el
o
p
ment
To i
d
e
n
ti
f
y t
h
e
b
e
s
t
-
p
e
rf
o
rm
i
n
g
m
o
d
el
f
o
r
p
r
e
d
i
c
ti
n
g
t
h
e
c
o
rr
e
c
t
c
la
ss
ifi
c
atio
n
, t
h
e
r
e
w
as a s
e
r
i
e
s o
f
co
n
fi
g
ur
atio
n
c
h
a
n
g
e
s o
f
t
h
e
h
y
p
e
rp
a
r
a
m
e
t
e
r
s. All
e
x
p
e
r
i
m
e
n
ts
w
e
r
e
t
r
ai
n
e
d
i
n
t
h
e
sam
e
mann
e
r
, ot
h
e
r
t
han
t
h
e
s
p
e
c
ifi
c
u
s
e
cas
e
d
i
s
p
l
a
ye
d
i
n
t
h
e t
ab
le
s
b
elo
w
i
n
S
ectio
ns
3.2.13.2.5.
3.2.1.
P
r
et
r
ai
n
i
n
g
Pr
et
r
a
i
n
i
n
g o
ff
e
r
s
t
h
e
ab
ility to
pr
ovi
d
e
m
o
r
e i
nf
o
r
ma
tio
n
to t
h
e
m
o
d
el
b
e
f
o
r
e fi
n
e
-
t
u
n
i
n
g
it o
n
a
d
o
w
ns
t
r
ea
m
ta
s
k
;
t
h
i
s
w
a
s
ex
p
lai
n
e
d
i
n
d
e
p
t
h
i
n
t
h
e
M
et
h
o
d
olo
g
y
S
e
c
tio
n
.
W
e e
x
p
e
r
i
m
e
n
te
d
w
it
h
t
r
ai
n
i
n
g
t
h
e
m
o
d
el
s
o
n
d
i
ff
e
r
e
n
t va
r
iatio
ns
o
f
t
h
e
d
oc
um
e
n
t
s
to
s
ee
i
f
t
h
e
r
e
w
e
r
e i
m
pr
ove
m
e
n
t
s
an
d
c
o
m
p
a
r
e
d
t
h
e
m
w
it
h
a
m
o
d
el
w
it
h
o
u
t t
h
e
pr
et
r
a
i
n
i
n
g
s
t
e
p
. T
h
e
pr
e
t
r
ai
n
i
n
g
TRC5K
d
ata
s
e
t
h
a
s
2
m
illio
n
tok
e
ns
, TRC100K 43
m
illio
n
tok
e
ns
, a
nd
TRC150K 61
m
illio
n
tok
e
ns
. Eac
h
TRC
d
ata
s
e
t, a
s
it
s
d
oc
um
e
n
t
s
iz
e
i
n
c
r
e
a
s
e
s
, i
n
cl
ud
e
s
t
h
e
toke
ns
fr
o
m
t
h
e
sm
alle
r
d
ata
s
et
s
.
B
a
s
e
d
o
n
o
ur
e
x
p
e
r
i
m
e
n
tal
r
e
su
lt
s
i
n
Table
s
2 a
n
d
3, t
h
e
m
o
d
el
sh
o
w
e
d
i
ns
ig
n
ifi
can
t i
m
pr
ove
m
e
n
t
w
h
e
n
pr
et
r
a
i
n
e
d
w
it
h
a
0.01 i
m
pr
ove
m
e
n
t o
f
t
h
e F1
-
w
ei
g
h
te
d
s
co
r
e. Alt
h
o
u
g
h
pr
et
r
ai
n
i
n
g
a
m
o
d
el
pr
o
p
e
r
ly
w
ill
sh
o
w
m
o
r
e co
ns
i
s
te
n
t
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
10 o
f
21
pr
e
d
ictio
n
s bala
n
c
e
d
ac
r
oss t
h
e
d
atas
e
t, t
h
is is s
h
o
wn
as t
h
e
d
i
ff
e
r
e
n
c
e
i
n
acc
ur
acy b
e
t
w
ee
n
n
o
pr
et
r
ai
n
i
n
g
a
n
d
TRC100K
pr
et
r
ai
n
i
n
g
a
n
d
t
h
e
w
ei
g
h
te
d
F1
-s
co
r
e.
Ta
b
le 2.
M
o
d
e
l acc
ur
acy,
w
h
e
n
t
h
e
pr
e
t
r
ai
n
i
n
g
d
ata
s
e
t i
s
va
r
iabl
e
a
n
d
all ot
h
e
r
p
a
r
a
m
e
t
e
rs
a
r
e
fi
x
e
d
.
W
it
h
fi
n
e
-
t
un
i
n
g
o
f
t
h
e
d
ata
s
e
t
:
F
SM
, l
e
a
rn
i
n
g
r
at
e:
2 × 10
−4
,
e
p
oc
hs
:
50, batc
h
s
iz
e:
128. T
h
e
bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
P
r
et
r
ai
n
i
ng
Data
s
et
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
N
O
NE
TRC5K
TRC100K
TRC150K
0.526
0.506
0.536
0.512
0.610
0.620
0.634
0.593
0.572
0.520
0.547
0.495
0.570
0.548
0.572
0.533
Ta
b
le 3.
M
o
d
e
l
w
e
i
g
h
t
e
d
F
1
-s
co
r
e
,
w
h
e
n
t
h
e
pr
e
t
r
ai
n
i
n
g
d
ata
s
e
t i
s
va
r
iabl
e
a
n
d
all ot
h
e
r
p
a
r
a
m
e
t
e
rs
a
r
e fixe
d
.
W
it
h
fi
n
e
-
t
u
n
i
n
g
o
f
t
h
e
d
ata
s
et
:
F
S
M
, lea
rn
i
n
g
r
ate
:
2 × 10
4
, e
p
o
chs
:
50,
b
at
ch
s
ize
:
128.
T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
P
r
et
r
ai
n
i
ng
Data
s
et
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
N
O
NE
TRC5K
TRC100K
TRC150K
0.505
0.457
0.512
0.471
0.584
0.593
0.633
0.568
0.555
0.475
0.529
0.450
0.548
0.509
0.558
0.496
3.2.2. Fi
n
e
-
T
un
i
n
g
To t
r
ai
n
a
m
o
d
el to
m
ake
pr
e
d
ictio
ns
, a
d
ata
s
et
mus
t co
n
tai
n
a
s
et o
f
d
oc
um
e
n
t
s
w
it
h
t
h
e co
rr
e
sp
o
n
d
i
n
g
s
e
n
ti
m
e
n
t cla
ss
ificatio
n
. T
h
i
s
ca
n
be e
x
pr
e
ss
e
d
a
s
a
r
a
n
g
e
fr
o
m
[
−1, 1
]
,
a
s
te
x
t, o
r
a
s
a
n
u
m
e
r
ical
r
e
pr
e
s
e
n
tatio
n
f
o
r
eac
h
cla
ss
ificatio
n
. Fi
n
a
n
cial i
ns
t
r
u
m
e
n
t
s
ca
n
m
ove i
n
t
h
r
ee
d
i
r
e
c
tio
ns
:
up
,
d
o
w
n
, o
r
s
i
d
e
w
a
y
s
. To
s
upp
o
r
t t
h
e
m
o
b
ility o
f
ass
et
s
i
n
a
m
a
r
ket, a
d
ata
s
et o
r
a
d
e
r
ivative o
f
a
d
ata
s
et co
n
tai
n
i
n
g
t
hr
ee
s
e
n
ti
m
e
n
t label
s
i
s
r
e
qu
i
r
e
d
to t
r
ai
n
t
h
e
m
o
d
el
f
o
r
t
h
o
s
e
pr
e
d
ictio
ns
.
W
e ca
n
take a
d
va
n
ta
g
e o
f
a tec
hn
iq
u
e i
n
m
ac
h
i
n
e
l
e
a
rn
i
n
g
call
e
d
fi
n
e
-
t
un
i
n
g
. T
h
e
b
e
n
e
fit o
f
fi
n
e
-
t
un
i
n
g
is t
h
at yo
u
ca
n
t
r
ai
n
a
m
o
d
e
l to
m
a
k
e
bette
r
pr
e
d
ictio
ns
abo
u
t a
d
o
w
ns
t
r
ea
m
ta
s
k, a
s
e
x
p
lai
n
e
d
ea
r
lie
r
.
T
h
e
r
e
s
u
lt
s
i
n
T
ab
le
s
4
an
d
5
b
e
a
r
o
u
t t
ha
t
s
ele
c
ti
n
g t
h
e
r
ig
h
t
d
a
t
as
et
f
o
r
fi
n
e
-
t
u
n
i
n
g
i
s
c
r
iti
ca
l.
Wh
e
n
c
o
m
p
a
r
i
n
g
ac
r
o
ss
t
h
r
ee
d
a
t
as
et
s
F
S
M
,
S
ET5,
an
d
SS
IX,
w
e
ha
ve
a
l
a
r
ge
r
a
n
g
e o
f
p
e
rf
o
rm
a
n
ce
fr
o
m
[
0.336, 0.572
]
co
n
t
r
ib
u
ti
n
g
to a
p
e
rf
o
rm
a
n
ce boo
s
t at a ba
s
eli
n
e
o
f
70.2
%
.
O
n
ce
w
e i
d
e
n
tifie
d
t
h
at t
h
e F
SM
d
ata
s
et
’s
p
e
rf
o
rm
a
n
ce
w
a
s
t
h
e be
s
t,
w
e
w
a
n
te
d
to o
p
ti
m
ize t
h
e
m
o
d
el by t
r
ai
n
i
n
g
it o
n
a bala
n
ce
d
d
ata
s
et
us
i
n
g
SM
O
TE
[
58
]
.
SM
O
TE i
s
a
tec
hn
iq
u
e t
h
at allo
w
s
us
to bala
n
ce a
n
un
bala
n
ce
d
d
ata
s
et
;
i
n
o
ur
ca
s
e, it
w
ill
s
y
n
t
h
etically
pr
ovi
d
e
r
e
pr
e
s
e
n
tatio
n
ac
r
oss all s
e
n
ti
m
e
n
t class
e
s
f
o
r
t
r
ai
n
i
n
g
to b
e
at 33.3
%
. T
h
is t
e
c
hn
iq
u
e
s
h
o
w
e
d
t
h
e
co
mp
l
e
x
ity o
f
s
e
l
e
cti
n
g
t
h
e
pr
o
p
e
r
fi
n
e
-
t
un
i
n
g
d
atas
e
t as
w
e
sa
w
a
d
e
g
r
a
d
atio
n
o
f
p
e
rf
o
rm
a
n
ce
fr
o
m
t
h
e
m
o
d
el
w
h
e
n
t
r
ai
n
i
n
g
ac
r
o
ss
t
h
e boa
r
d
. T
h
e
s
a
m
e co
n
cl
us
io
n
w
a
s
r
e
ach
e
d
w
h
e
n
w
e ex
p
e
r
i
m
e
n
te
d
w
it
h
ADA
S
YN,
w
h
i
ch
pr
ovi
d
e
s
a
ba
l
anc
e
d
d
i
s
t
r
i
b
u
te
d
d
a
t
as
et
bas
e
d
o
n
t
h
e
d
i
f
fi
c
u
lty o
f
a
d
o
c
u
m
e
n
t,
b
u
t
f
o
c
u
s
e
d
o
n
t
h
e
m
i
n
o
r
ity
c
l
ass
ifi
ca
tio
n
w
it
h
i
n
t
h
e
d
ata
s
et
[
59
]
.
Ta
b
le 4.
M
o
d
e
l acc
ur
acy,
w
h
e
n
t
h
e
fi
n
e
-
t
un
i
n
g
d
ata
s
e
t i
s
va
r
iabl
e
a
n
d
all ot
h
e
r
p
a
r
a
m
e
t
e
rs
a
r
e
fi
x
e
d
.
W
it
h
pr
et
r
ai
n
i
n
g
o
f
t
h
e
d
ata
s
et
:
TRC100K, lea
r
n
i
n
g
r
ate
:
2 × 10
4
, e
p
o
chs
:
50,
b
at
ch
s
ize
:
128. T
h
e
bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Fi
n
e
-
T
un
i
ng
Data
s
et
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
F
SM
F
SM
_ADA
S
YN
F
SM
_
SM
O
TE
S
ET5
SS
IX
0.536
0.447
0.393
0.501
0.494
0.634
0.616
0.651
0.246
0.272
0.547
0.439
0.375
0.510
0.501
0.572
0.501
0.473
0.419
0.422
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
11 o
f
21
Ta
b
le 5.
M
o
d
e
l
w
e
i
g
h
t
e
d
F
1
-s
co
r
e
,
w
h
e
n
t
h
e
fi
n
e
-
t
un
i
n
g
d
ata
s
e
t i
s
va
r
iabl
e
a
n
d
all ot
h
e
r
p
a
r
a
m
e
t
e
rs
a
r
e
fi
x
e
d
.
W
it
h
p
r
et
r
ai
n
i
n
g
o
f
t
h
e
d
ata
s
et
:
TRC100K, lea
rn
i
n
g
r
ate
:
2 × 10
4
, e
p
oc
hs
:
50, batc
h
s
ize
:
128.
T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Fi
n
e
-
T
un
i
ng
Data
s
et
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
F
SM
F
SM
_ADA
S
YN
F
SM
_
SM
O
TE
S
ET5
SS
IX
0.512
0.441
0.354
0.424
0.428
0.633
0.598
0.583
0.160
0.213
0.529
0.421
0.316
0.426
0.423
0.558
0.487
0.417
0.336
0.355
3.2.3. Lea
rn
i
n
g
Rate
I
n
t
h
e
sam
e vei
n
t
ha
t t
h
e fi
n
e
-
t
u
n
i
n
g
d
a
t
as
et i
s
c
r
iti
ca
l
f
o
r
p
e
r
f
o
r
manc
e,
w
e
f
o
u
n
d
t
ha
t t
h
e le
a
r
n
i
n
g
r
a
te i
s
i
m
p
o
r
t
an
t
as
w
ell. Re
s
e
a
r
ch
w
o
r
k
c
o
n
du
c
te
d
b
y Li et
a
l.
[
60
]
c
o
nc
l
ud
e
d
t
ha
t,
w
h
e
n
fi
n
e
-
t
u
n
i
n
g
a
B
ERT
m
o
d
el,
s
ele
c
ti
n
g t
h
e
a
ppr
o
pr
i
a
te le
a
r
n
i
n
g
r
a
te i
s
i
m
p
o
r
t
an
t
f
o
r
ca
p
t
ur
i
n
g
d
et
a
il
s
an
d
ge
n
e
r
a
l
f
e
a
t
ur
e i
nf
o
r
ma
tio
n
. T
h
i
s
w
as
r
eve
a
le
d
i
n
an
ex
p
e
r
i
m
e
n
t
f
o
c
u
s
e
d
o
n
fi
n
e
-
t
u
n
i
n
g t
h
e le
a
r
n
i
n
g
r
a
te
an
d
ba
t
ch
s
ize
h
y
p
e
rp
a
r
am
ete
r
s
. A
s
i
s
evi
d
e
n
t i
n
T
ab
le
s
6
an
d
7
b
elo
w
, t
h
e
r
e
a
r
e l
a
r
ge v
a
r
i
anc
e
s
i
n
s
ele
c
ti
n
g
a
p
a
r
ti
c
u
l
a
r
l
e
a
rn
i
n
g
r
at
e
r
a
n
g
i
n
g
b
e
t
w
ee
n
aby
sm
al
r
e
su
lt
s
w
it
h
t
h
e
w
e
i
g
h
t
e
d
F
1
-s
co
r
e
to a
n
acc
e
p
tabl
e
m
o
d
e
l. R
e
s
u
lts va
r
y
du
e
to a
f
e
w
f
acto
r
s, b
u
t i
n
p
a
r
tic
u
la
r
, t
r
ai
n
i
n
g
a
m
o
d
e
l a
nd
t
un
i
n
g
t
h
is
h
y
p
e
rp
a
r
a
m
ete
r
va
r
ie
d
it
s
e
ff
ective
n
e
ss
i
n
lea
rn
i
n
g
t
h
e
d
o
w
ns
t
r
ea
m
ta
s
k.
Wh
ile t
r
ai
n
i
n
g
a
m
o
d
el,
w
e o
p
ti
m
ize
d
f
o
r
t
h
e lo
ss
fun
ctio
n
;
t
h
e lea
rn
i
n
g
r
ate
d
ete
rm
i
n
e
s
h
o
w
la
r
g
e t
h
e
s
te
p
w
e
n
ee
d
to take i
s
w
h
ile ac
h
ievi
n
g
a
m
i
n
i
m
al lo
ss
.
Wh
e
n
s
electi
n
g
a
m
o
d
el ba
s
e
d
o
n
t
h
e o
p
ti
m
izatio
n
o
f
t
h
e lo
ss
fun
ctio
n
, it i
s
c
r
itical to
und
e
rs
ta
nd
h
o
w
t
h
e
m
o
d
e
l
p
e
rf
o
rm
e
d
r
e
lativ
e
to t
h
e
lo
ss
fun
ctio
n
wh
il
e
t
e
s
ti
n
g
t
h
e
r
e
su
lt
s
.
A la
r
g
e
l
e
a
rn
i
n
g
r
at
e
li
k
e
2 × 10
−3
d
o
e
s
n
ot
r
e
ac
h
t
h
e
g
lobal o
r
local
m
i
n
i
m
a, i
n
a
dd
itio
n
to
p
o
ss
i
b
ly le
a
d
i
n
g to l
a
r
ge
upd
a
te
s
o
r
chan
ge
s
,
f
ur
t
h
e
r
le
a
d
i
n
g to t
h
e i
nab
ility to
r
e
ach
an
acc
e
p
tabl
e
p
e
rf
o
rm
a
n
c
e
. T
h
i
s
w
a
s
fur
t
h
e
r
r
e
fi
n
e
d
a
s
w
e
al
s
o
n
otic
e
d
t
h
at
m
aki
n
g
too
m
a
n
y
sm
all c
h
a
n
g
e
s
lik
e
2 × 10
−6
d
i
d
n
ot ac
h
i
e
v
e
o
p
ti
m
al
r
e
su
lt
s
w
h
e
n
co
mp
a
r
e
d
to a
r
e
lativ
e
ly
l
a
r
ge
r
le
a
r
n
i
n
g
r
a
te like 2 × 10
−4
.
S
i
m
il
a
r
ly, Li et
a
l.
c
o
nc
l
ud
e
d
t
ha
t
a
lo
w
e
r
le
a
r
n
i
n
g
r
a
te
pr
ovi
d
e
d
t
h
e
b
e
s
t
r
e
s
u
lt
s
w
h
e
n
t
r
a
i
n
i
n
g
a
B
ERT
m
o
d
el
ac
r
o
ss
d
i
ff
e
r
e
n
t
d
o
ma
i
ns
,
as
t
h
ey
c
o
m
p
a
r
e
d
lea
rn
i
n
g
r
ate
s
o
f
4 × 10
−5
, 3 × 10
5
, a
n
d
2 × 10
−5
a
n
d
f
o
u
n
d
2 × 10
−5
to
p
e
rf
o
rm
t
h
e be
s
t.
T
a
b
le 6.
M
o
d
el
acc
ur
ac
y, v
a
r
i
ab
le le
a
r
n
i
n
g
r
a
te,
w
it
h
a
ll ot
h
e
r
p
a
r
am
ete
r
s
fixe
d
.
W
it
h
pr
et
r
a
i
n
i
n
g o
f
t
h
e
d
ata
s
et
:
TRC100K, fi
n
e
-
t
u
n
i
n
g
d
ata
s
et
:
F
SM
, e
p
oc
hs
:
50, batc
h
s
ize
:
128. T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Lea
r
n
i
ng
Rate
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
2 × 10
3
2 × 10
4
2 × 10
5
2 × 10
6
0.394
0.536
0.459
0.481
0.151
0.634
0.658
0.654
0.407
0.547
0.495
0.477
0.317
0.572
0.537
0.537
T
a
b
le 7.
M
o
d
el
w
eig
h
te
d
F1
-sc
o
r
e, v
a
r
i
ab
le le
a
r
n
i
n
g
r
a
te,
w
it
h
a
ll ot
h
e
r
p
a
r
am
ete
r
s
fixe
d
.
W
it
h
pr
et
r
ai
n
i
n
g
o
f
t
h
e
d
a
t
as
et:
TRC100K, fi
n
e
-
t
un
i
n
g
d
a
t
as
et:
F
SM
,
e
p
oc
h
s
:
50, ba
t
c
h
siz
e:
128. T
h
e
bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Lea
r
n
i
ng
Rate
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
2 × 10
3
2 × 10
4
2 × 10
5
2 × 10
6
0.223
0.512
0.407
0.403
0.040
0.633
0.591
0.591
0.236
0.529
0.444
0.410
0.166
0.558
0.481
0.468
3.2.4. E
p
oc
h
T
h
e e
p
o
ch
i
s
a
h
y
p
e
rp
a
r
am
ete
r
t
ha
t i
s
i
m
p
o
r
t
an
t
f
o
r
c
o
n
t
r
olli
n
g t
h
e le
a
r
n
i
n
g
ca
p
a-
b
ilitie
s
o
f
a
n
e
ur
a
l
n
et
w
o
r
k
m
o
d
el.
Wh
ile ev
a
l
u
a
ti
n
g Fi
n
S
o
S
e
n
t,
w
e
f
o
u
n
d
,
as
sh
o
w
n
i
n
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
12 o
f
21
Table
s
8 a
n
d
9, t
ha
t
u
s
i
n
g t
h
e l
a
r
ge
s
t e
p
o
ch
d
i
d
n
ot
c
o
ns
tit
u
te t
h
e
b
e
s
t
p
e
r
f
o
r
manc
e,
b
u
t
r
at
h
e
r
,
s
ettli
n
g
at a
r
o
un
d
50 e
p
oc
hs
i
mpr
ove
d
p
e
rf
o
rm
a
n
ce o
n
ave
r
a
g
e by 10
%
w
h
e
n
co
m-
p
a
r
e
d
to t
h
e 15 e
p
oc
h
t
r
a
i
n
i
n
g
s
et
f
o
r
t
h
e
w
eig
h
te
d
F1
-sc
o
r
e. Devli
n
et
a
l.
[
17
]
o
r
igi
na
lly
t
r
ai
n
e
d
t
h
e
B
ERT
m
o
d
e
l to
p
e
rf
o
rm
ac
r
oss
mu
lti
p
l
e
tas
k
s, b
u
t
w
e
f
o
und
m
o
r
e
s
u
cc
e
ss
u
si
n
g
a
la
r
g
e
r
e
p
oc
h
co
un
t like 50.
T
a
b
le 8.
M
o
d
el
acc
ur
ac
y, v
a
r
i
ab
le e
p
o
ch
,
w
it
h
a
ll ot
h
e
r
p
a
r
am
ete
r
s
fixe
d
.
W
it
h
pr
et
r
a
i
n
i
n
g o
f
t
h
e
d
a
t
as
et
:
TRC100K, fi
n
e
-
t
u
n
i
n
g
d
a
t
as
et
:
F
S
M
, le
a
r
n
i
n
g
r
a
te
:
2 × 10
4
,
ba
t
ch
s
ize
:
128. T
h
e
b
ol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
E
p
oc
h
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
150.490
500.536
750.515
0.621
0.634
0.647
0.484
0.547
0.521
0.528
0.572
0.561
Ta
b
le 9.
M
o
d
e
l
w
e
i
g
h
t
e
d
F1
-s
co
r
e
, va
r
iabl
e
e
p
oc
h
,
w
it
h
all ot
h
e
r
p
a
r
a
m
e
t
e
rs
fi
x
e
d
.
W
it
h
pr
e
t
r
ai
n
i
n
g
o
f
t
h
e
d
a
t
as
et
:
TRC100K, fi
n
e
-
t
u
n
i
n
g
d
a
t
as
et
:
F
S
M
, le
a
r
n
i
n
g
r
a
te
:
2 × 10
−4
,
ba
t
ch
s
ize
:
128. T
h
e
bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
E
p
oc
h
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
150.440
500.512
750.487
0.595
0.633
0.588
0.430
0.529
0.493
0.488
0.558
0.523
3.2.5.
B
atc
h
S
ize
A
n
ot
h
e
r
h
y
p
e
rp
a
r
a
m
e
t
e
r
t
h
at co
n
t
r
ib
u
t
e
s to t
h
e
num
b
e
r
o
f
sa
mp
l
e
s t
h
e
m
o
d
e
l is
g
iv
e
n
,
s
i
m
il
a
r
to t
h
e e
p
o
ch
, i
s
t
h
e
ba
t
ch
s
ize. A
cc
o
rd
i
n
g to
P
o
p
el
an
d
B
oj
a
r
, t
h
e
ba
t
ch
s
ize i
s
d
e
fi
n
e
d
as,
is t
h
e
num
b
e
r
o
f
t
r
ai
n
i
n
g
e
x
a
mp
l
e
s
u
s
e
d
by o
n
e
G
P
U
i
n
o
n
e
t
r
ai
n
i
n
g
st
e
p
[
61
]
.
T
h
ei
r
fi
n
d
i
n
g
s
a
l
w
a
y
s
c
o
nc
l
ud
e
d
t
ha
t l
a
r
ge
r
ba
t
ch
s
ize
s
le
d
to
a
q
u
a
lity
m
o
d
el i
n
t
h
ei
r
t
r
ansf
o
r
m
e
r
m
o
d
el
f
o
r
l
an
g
u
a
ge t
r
ans
l
a
tio
ns
.
W
e
a
rr
ive
d
a
t
a
s
i
m
il
a
r
c
o
nc
l
u
s
io
n
w
h
e
n
co
mp
a
r
i
n
g
f
o
ur
d
i
ff
e
r
e
n
t s
e
ts o
f
batc
h
siz
e
s
wh
e
n
av
e
r
a
g
e
d
ac
r
oss all t
e
st
d
atas
e
ts, as s
h
o
wn
i
n
Table
s
10 a
n
d
11. It i
s
a
l
s
o i
m
p
o
r
t
an
t to
n
ote t
ha
t l
a
r
ge
r
ba
t
ch
s
ize
s
a
l
s
o
c
o
n
t
r
i
b
u
te to la
r
g
e
r
t
r
ai
n
i
n
g
ti
m
e
s
wh
e
n
co
mp
a
r
e
d
a
g
ai
ns
t t
h
e
ot
h
e
r
e
val
u
at
e
d
m
o
d
e
l
s
w
it
h
lo
w
e
r
batc
h
s
ize
s
.
Ta
b
le 10.
M
o
d
e
l acc
ur
acy, va
r
iabl
e
ba
t
c
h
siz
e
,
w
i
t
h
all o
t
h
e
r
p
a
r
a
m
ete
r
s fi
x
e
d
.
W
i
t
h
pr
et
r
ai
n
i
n
g
o
f
t
h
e
d
a
t
a
s
et:
TRC100K, fi
n
e
-
t
un
i
n
g
d
a
t
a
s
et:
F
SM
, l
e
a
rn
i
n
g
r
a
te:
2 × 10
4
,
e
p
oc
hs
:
50. T
h
e
bol
d
e
d
av
e
r
a
g
e
val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Batc
h
Size
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
32
64
96
128
0.394
0.492
0.542
0.536
0.151
0.545
0.575
0.634
0.407
0.537
0.551
0.547
0.317
0.525
0.556
0.572
T
a
b
le 11.
M
o
d
el
w
eig
h
te
d
F1
-sc
o
r
e, v
a
r
i
ab
le
ba
t
ch
s
ize,
w
it
h
a
ll ot
h
e
r
p
a
r
am
ete
r
s
fixe
d
.
W
it
h
pr
et
r
ai
n
i
n
g
o
f
t
h
e
d
ata
s
et
:
TRC100K, fi
n
e
-
t
u
n
i
n
g
d
ata
s
et
:
F
S
M
, lea
r
n
i
n
g
r
ate
:
2 × 10
4
, e
p
o
chs
:
50.
T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Batc
h
Size
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
32
64
96
128
0.223
0.451
0.517
0.512
0.040
0.551
0.586
0.633
0.236
0.494
0.524
0.529
0.166
0.499
0.542
0.558
3.2.6. Ha
n
d
li
n
g
I
m
bala
n
ce
d
Cla
ss
ificatio
n
Data
s
et
s
W
e
r
e
c
og
n
ize
w
e
a
r
e t
r
a
i
n
i
n
g
a
m
o
d
el t
ha
t
has
an
i
mba
l
anc
e
d
c
l
ass
ifi
ca
tio
n
s
et
f
o
r
fi
n
e
-
t
un
i
n
g
, a
nd
to
r
e
solv
e
t
h
is li
m
itatio
n
i
n
t
h
e
d
atas
e
t,
w
e
t
r
ai
n
e
d
t
w
o
m
o
d
e
ls to co
mp
a
r
e
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
13 o
f
21
a
g
a
i
ns
t
a
bas
eli
n
e
m
o
d
el. A
s
s
t
a
te
d
e
a
r
lie
r
,
w
e i
nc
o
rp
o
r
a
te
d
ADA
S
YN
an
d
S
M
OTE to
c
o
m
p
a
r
e
d
i
ff
e
r
e
n
t te
chn
i
q
u
e
s
i
n
s
e
a
r
ch
o
f
an
i
nc
r
e
as
e i
n
p
e
r
f
o
r
manc
e. O
ur
r
e
s
u
lt
s
a
r
e
sh
o
w
n
i
n
Ta
b
le
s
12 a
n
d
13. Alt
h
o
ug
h
,
f
o
r
o
n
e
d
ata
s
et, ADAY
S
N o
u
t
p
e
r
f
o
r
m
e
d
t
h
e ot
h
e
r
m
o
d
el
s
, it
d
i
d
n
ot
ha
ve i
nc
r
e
as
e
d
p
e
r
f
o
r
manc
e o
n
t
h
e
d
a
t
as
et
s
w
it
h
a
lo
w
e
r
w
eig
h
te
d
F1
-
s
co
r
e a
n
d
acc
ur
acy. It i
s
un
d
ete
rm
i
n
e
d
i
f
a bala
n
ce
d
d
i
s
t
r
ib
u
tio
n
o
f
label
s
ca
n
lea
d
to a
p
e
rf
o
rm
a
n
c
e
i
n
c
r
e
as
e
, a
nd
t
h
is is
r
e
pr
e
s
e
n
t
e
d
i
n
t
h
e
t
e
sti
n
g
d
atas
e
t a
nd
S
e
ctio
n
3.2.2,
wh
e
r
e
t
h
e
m
o
s
t
b
ala
nc
e
d
fi
n
e
-
t
u
n
i
n
g
d
ata
s
et availa
b
le i
n
t
h
i
s
ex
p
e
r
i
m
e
n
tatio
n
d
i
d
n
ot yiel
d
t
h
e
be
s
t
r
e
su
lt
s
.
Ta
b
le 12.
M
o
d
e
l acc
ur
acy,
wh
e
n
t
h
e
fi
n
e
-
t
un
i
n
g
d
a
t
a
s
et
i
s
va
r
iabl
e
a
nd
all o
t
h
e
r
p
a
r
a
m
ete
rs
a
r
e
fi
x
e
d
.
W
it
h
pr
et
r
ai
n
i
n
g
o
f
t
h
e
d
ata
s
et
:
TRC100K, lea
r
n
i
n
g
r
ate
:
2 × 10
4
, e
p
o
chs
:
50,
b
at
ch
s
ize
:
128. T
h
e
bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Fi
n
e
-
T
un
i
ng
Data
s
et
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
F
SM
F
SM
_ADA
S
YN
F
SM
_
SM
O
TE
0.536
0.447
0.393
0.634
0.616
0.651
0.547
0.439
0.375
0.572
0.501
0.473
Ta
b
le 13.
M
o
d
e
l
w
e
i
g
h
te
d
F
1
-s
co
r
e
,
wh
e
n
t
h
e
fi
n
e
-
t
un
i
n
g
d
a
t
a
s
et
i
s
va
r
iabl
e
a
nd
all o
t
h
e
r
p
a
r
a
m
ete
rs
a
r
e fixe
d
.
W
it
h
pr
et
r
a
i
n
i
n
g
d
a
t
as
et
:
TRC100K, le
a
r
n
i
n
g
r
a
te
:
2 × 10
−4
, e
p
o
chs
:
50,
ba
t
ch
s
ize
:
128.
T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
d
ata
s
et.
Fi
n
e
-
T
un
i
ng
Data
s
et
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
F
SM
F
SM
_ADA
S
YN
F
SM
_
SM
O
TE
0.512
0.441
0.354
0.633
0.598
0.583
0.529
0.421
0.316
0.558
0.487
0.417
3.3.
M
o
d
el De
v
el
o
p
ment Results
U
p
o
n
c
o
n
du
c
ti
n
g
m
a
n
y t
h
o
r
o
ug
h
ex
p
e
r
i
m
e
n
t
s
al
m
o
s
t
s
i
m
ila
r
to a
g
r
i
d
s
ea
rch
u
s
i
n
g
d
i
ff
e
r
e
n
t
d
ata
s
et
s
a
n
d
h
y
p
e
r
p
a
r
a
m
ete
rs
,
w
e
w
e
r
e a
b
le to
b
u
il
d
a
s
e
n
ti
m
e
n
t a
n
alyze
r
t
h
at
p
e
rf
o
rms
w
ell o
n
a t
r
i
n
a
r
y cla
ss
ificatio
n
pr
oble
m
f
o
r
fi
n
a
n
cial te
x
t
s
i
n
s
ocial
m
e
d
ia.
S
o
m
e
h
y
p
e
rp
a
r
am
ete
r
s
pr
ovi
d
e
d
a
b
oo
s
t to t
h
e
m
o
d
el
b
eyo
n
d
t
h
e
d
a
t
a
t
ha
t
w
e
r
e
u
s
e
d
. T
h
e
r
e
a
r
e
a
f
e
w
s
te
p
s
w
h
e
n
t
r
a
i
n
i
n
g
a
m
o
d
el
f
o
r
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
t
ha
t i
s
pr
o
f
o
u
n
d
, like t
h
e
s
ele
c
tio
n
o
f
t
h
e fi
n
e
-
t
u
n
i
n
g
d
a
t
as
et, t
h
e le
a
r
n
i
n
g
r
a
te,
an
d
t
h
e
ba
t
ch
s
ize.
W
e o
bs
e
r
ve
d
t
h
at all
d
ata
s
et
s
a
r
e
n
ot c
r
eate
d
e
q
u
al,
w
h
e
r
e
s
p
ecific
d
ata
s
et
s
o
p
ti
m
ize t
h
e lea
rn
i
n
g
fr
o
m
t
h
e
m
ac
h
i
n
e lea
rn
i
n
g
m
o
d
el a
n
d
t
h
e te
s
ti
n
g
d
ata
s
et
s
al
s
o
pr
ovi
d
e co
mp
le
x
itie
s
w
h
e
r
e, o
n
av
e
r
a
g
e
, t
h
e
m
o
d
e
l
s
p
e
rf
o
rm
w
e
ll.
W
e
al
s
o ob
s
e
r
v
e
d
t
h
at fi
nd
i
n
g
t
h
e
co
rr
e
ct bala
n
c
e
i
n
t
h
e
l
e
a
rn
i
n
g
r
at
e
is
e
ss
e
n
tial
f
o
r
fi
nd
i
n
g
g
r
e
at i
n
c
r
e
as
e
s i
n
p
e
rf
o
rm
a
n
c
e
, i
n
a
dd
itio
n
to t
h
e
batc
h
siz
e
u
s
e
d
.
F
r
o
m
o
ur
e
x
p
e
r
i
m
e
n
tatio
n
a
nd
e
x
p
e
ctatio
n
s, it
w
as asto
n
is
h
i
n
g
to
n
otic
e
t
h
at t
h
e
pr
et
r
ai
n
i
n
g
d
ata
s
et
s
us
e
d
r
eac
h
e
d
i
n
co
n
cl
us
ive
r
e
su
lt
s
w
it
h
evi
d
e
n
ce o
f
m
a
r
g
i
n
ally bette
r
p
e
rf
o
rm
a
n
ce.
To
f
ur
t
h
e
r
u
n
d
e
r
s
t
an
d
t
h
e
p
e
r
f
o
r
manc
e o
f
o
ur
Fi
n
S
o
S
e
n
t
m
o
d
el, e
rr
o
r
ana
ly
s
i
s
w
as
c
o
n
du
c
te
d
c
o
m
p
a
r
i
n
g t
h
e g
r
o
u
n
d
t
ru
t
h
w
it
h
t
h
e Fi
n
S
o
S
e
n
t
pr
e
d
i
c
tio
n
.
W
e
f
o
u
n
d
n
o
co
n
cl
u
siv
e
e
vi
d
e
n
c
e
i
n
t
h
e
t
e
x
t st
ru
ct
ur
e
ac
r
oss t
h
e
t
hr
ee
t
e
st
e
d
d
atas
e
ts o
f
F
i
n-
Li
n
, Tabo
rd
a,
a
n
d
S
a
n
d
e
rs
t
h
at
m
a
d
e t
h
e
m
g
r
a
mm
atically
d
i
ff
e
r
e
n
t.
W
e
r
eco
r
d
e
d
t
h
e
n
u
m
be
r
o
f
toke
ns
a
nd
t
h
e
s
um
o
f
n
o
un
s
pr
e
s
e
n
t i
n
e
ac
h
d
oc
um
e
n
t
w
it
h
i
n
t
h
e
d
atas
e
ts o
n
av
e
r
a
g
e
.
W
e
w
ill fi
r
st
d
e
fi
n
e
s
e
v
e
r
e
ly i
n
acc
ur
at
e
a
s
i
n
acc
ur
at
e
pr
e
d
ictio
ns
w
it
h
i
n
t
h
e
p
ola
r
e
x
t
r
e
m
e
s
o
f
s
e
n
ti
m
e
n
t
a
n
aly
s
i
s
,
f
o
r
e
x
a
mp
le t
h
e
pr
e
d
ictio
n
i
s
p
o
s
itive
w
h
e
n
t
h
e
g
r
o
un
d
t
ru
t
h
i
s
n
e
g
ative.
Wh
e
n
r
e
vi
e
w
i
n
g
t
h
e
r
e
su
lt
s
i
n
Tabl
e
s
1416, yo
u
w
ill fi
nd
t
h
at acc
ur
at
e
ly, i
n
acc
ur
at
e
ly,
a
nd
s
e
v
e
r
e
ly i
n
acc
ur
at
e
ly
pr
e
d
ict
e
d
d
oc
um
e
n
ts all
h
av
e
r
e
lativ
e
ly t
h
e
sa
m
e
st
ru
ct
ur
e
ac
r
oss
t
h
e
s
e
n
ti
m
e
n
t cla
ss
ificatio
ns
.
B
et
w
ee
n
acc
ur
ate a
n
d
i
n
acc
ur
ate
pr
e
d
ictio
ns
,
w
e
n
otice
d
o
n
a
ve
r
a
ge o
n
ly o
n
e toke
n
c
o
u
n
t
d
i
ff
e
r
e
nc
e
ac
r
o
ss
t
h
e
d
a
t
as
et
s
. I
n
a
c
o
m
p
a
r
i
s
o
n
b
et
w
ee
n
s
eve
r
ely i
nacc
ur
a
te toke
n
c
o
u
n
t
s
, t
h
e
pr
ofile o
f
t
h
e text i
s
s
i
m
il
a
r
to
an
ex
c
e
p
tio
n
f
r
o
m
S
a
n
d
e
rs
,
w
h
e
r
e i
n
t
h
e
p
o
s
itive cla
ss
ificatio
n
, t
h
e
r
e i
s
a
s
i
g
n
ifica
n
t
d
r
o
p
i
n
t
h
e toke
n
co
u
n
t
w
h
e
n
co
mp
a
r
e
d
to t
h
e acc
ur
ate a
nd
i
n
acc
ur
ate co
mp
a
r
i
s
o
ns
.
Wh
e
n
r
evie
w
i
n
g
n
o
un
co
un
t,
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
14 o
f
21
it i
s
a
l
s
o i
nc
o
nc
l
u
s
ive,
as
t
h
e
r
e
a
r
e
sam
p
le
s
w
h
e
r
e
a
l
a
r
ge
r
n
o
u
n
c
o
u
n
t
d
oe
s
n
ot
a
l
w
a
y
s
s
i
g
n
i
f
y t
h
at o
ur
m
o
d
el
pr
ovi
d
e
d
a
n
acc
ur
ate
pr
e
d
ictio
n
ac
r
o
ss
t
h
e boa
r
d
.
T
a
b
le 14. Re
s
u
lt
s
o
f
s
e
n
ti
m
e
n
t
c
l
ass
ifi
ca
tio
n
b
y toke
n
c
o
u
n
t
an
d
p
a
r
t
s-
o
f-s
p
ee
ch
f
o
r
t
h
e Fi
n-
Li
n
d
ata
s
et.
Fi
n
-
Li
n
Ne
g
ative
Ne
u
t
r
al
P
o
s
itive
25
23
11
9
Acc
ur
ate
:
Toke
n
Co
un
t
I
n
acc
ur
ate
:
Toke
n
Co
un
t
S
eve
r
ely I
n
acc
ur
ate
:
Toke
n
Co
un
t
Acc
ur
ate
:
No
un
Co
un
t
I
n
acc
ur
ate
:
No
un
Co
un
t
S
eve
r
ely I
n
acc
ur
ate
:
No
un
Co
un
t
22
22
23
7
8
8
22
23
22
7
9
7
T
a
b
le 15. Re
s
u
lt
s
o
f
s
e
n
ti
m
e
n
t
c
l
ass
ifi
ca
tio
n
b
y toke
n
c
o
u
n
t
an
d
p
a
r
t
s-
o
f-s
p
ee
ch
f
o
r
t
h
e T
ab
o
rd
a
d
ata
s
et.
Ta
b
o
r
d
a
Ne
g
ative
Ne
u
t
r
al
P
o
s
itive
30
29
12
13
Acc
ur
ate
:
Toke
n
Co
un
t
I
n
acc
ur
ate
:
Toke
n
Co
un
t
S
eve
r
ely I
n
acc
ur
ate
:
Toke
n
Co
un
t
Acc
ur
ate
:
No
un
Co
un
t
I
n
acc
ur
ate
:
No
un
Co
un
t
S
eve
r
ely I
n
acc
ur
ate
:
No
un
Co
un
t
32
33
31
12
12
12
32
35
34
13
12
12
T
a
b
le 16. Re
s
u
lt
s
o
f
s
e
n
ti
m
e
n
t
c
l
ass
ifi
ca
tio
n
b
y toke
n
c
o
u
n
t
an
d
p
a
r
t
s-
o
f-s
p
ee
ch
f
o
r
t
h
e
S
an
d
e
r
s
d
ata
s
et.
Sa
nd
e
rs
Ne
g
ative
Ne
u
t
r
al
P
o
s
itive
20
21
Acc
ur
ate
:
Toke
n
Co
un
t
I
n
acc
ur
ate
:
Toke
n
Co
un
t
S
eve
r
ely I
n
acc
ur
ate
:
Toke
n
Co
un
t
Acc
ur
ate
:
No
un
Co
un
t
I
n
acc
ur
ate
:
No
un
Co
un
t
22
22
22
7
7
7
8
22
19
17
7
7
Wh
e
n
eval
u
ati
n
g
t
h
e
d
oc
u
m
e
n
t
s
o
f
s
eve
r
ely i
n
acc
u
r
ate
s
a
m
p
le
s
,
w
e i
d
e
n
tifie
d
a
f
e
w
ty
p
e
s
o
f
te
x
t t
h
at Fi
nS
o
S
e
n
t
s
t
r
ugg
le
d
to
pr
e
d
ict. T
h
e ty
p
e
s
o
f
te
x
t
r
a
n
g
e
d
fr
o
m
t
h
e
u
s
a
g
e o
f
c
o
m
p
lex
s
o
c
i
a
l
m
e
d
i
a
o
r
loo
s
e E
n
gli
sh
ve
r
b
i
a
ge like
n
u
ke
d
”, “GG”, o
r
w
eeeeeee”,
d
oc
um
e
n
ts
w
it
h
mu
lti
p
l
e
s
e
n
ti
m
e
n
ts, a
nd
pr
oc
e
ssi
n
g
t
e
x
t
w
it
h
e
m
otio
n
s o
f
r
e
g
r
e
t o
r
hum
o
r
like “$T
S
LA L
M
AO RI
P
to t
h
o
s
e t
ha
t
f
ollo
w
e
d
r
y
an
b
r
i
n
k
man
an
d
b
o
u
g
h
t $G
M
$F”. A
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
m
o
d
el like Fi
nS
o
S
e
n
t i
s
o
n
ly a
w
a
r
e o
f
t
h
e
r
a
w
te
x
t it i
s
pr
e
s
e
n
te
d
, a
n
d
t
h
i
s
c
r
eate
s
a
g
a
p
i
n
un
d
e
rs
ta
n
d
i
n
g
a
dd
itio
n
al i
nf
o
rm
atio
n
be
f
o
r
e
d
eci
d
i
n
g
o
n
a
s
e
n
ti
m
e
n
t
c
l
ass
ifi
ca
tio
n
. T
h
i
s
i
s
b
ette
r
ex
p
l
a
i
n
e
d
as
w
e
s
t
a
r
t to
c
o
m
p
a
r
e text
s
f
r
o
m
a
u
t
h
o
r
s
w
it
h
co
n
flicti
n
g
m
otivatio
ns
o
r
p
e
rsp
ective
s
. I
n
i
n
ve
s
ti
n
g
, ty
p
ically, a
n
a
ss
et i
n
c
r
ea
s
i
n
g
i
n
pr
ice
r
e
pr
e
s
e
n
t
s
a
p
o
s
itive
s
e
n
ti
m
e
n
t
f
o
r
t
h
e i
n
ve
s
to
r
,
as
it
s
go
a
l i
s
to o
p
ti
m
ize
an
d
ma
xi
m
ize
pr
ofits t
hr
o
u
g
h
t
h
e
i
n
c
r
e
as
e
i
n
t
h
e
val
u
e;
t
h
e
co
un
t
e
r
e
x
a
mp
l
e
is a b
e
a
r
i
n
v
e
sto
r
,
wh
o b
e
n
e
fits
f
r
o
m
a
d
e
c
r
e
as
e i
n
v
a
l
u
e.
Wh
e
n
a
text i
s
pr
e
s
e
n
te
d
to
a
m
o
d
el, it t
a
ke
s
t
h
e i
nf
o
r
ma
tio
n
g
iv
e
n
at
f
ac
e
val
u
e
,
un
a
w
a
r
e
o
f
t
h
e
m
otivati
n
g
f
acto
r
s o
f
t
h
e
a
u
t
h
o
r
, as s
h
o
wn
i
n
Tabl
e
17.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
15 o
f
21
Ta
b
le 17. Ty
p
e
s
o
f
co
mp
le
x
itie
s
i
n
d
oc
um
e
n
t
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
.
Ty
p
e
Sa
m
p
le Te
x
t
So
u
r
ce Data
s
et
Co
mp
le
x
s
ocial
m
e
d
ia ve
r
bia
g
e
$I
N
T
C $L
V
S
$
G
M $BX $
T
$C
T
L $
A
BB
V
J
e
rom
e
Pow
e
ll nu
k
e
d m
y
portfolio tod
a
y
.
GG
J
e
rom
e
$
F
Yea
h down w
e
g
o w
eeeeeee
Fi
n-
Li
n
ID
:
2018
-
09
-
26T20
:
47
:
46Z
Mu
lti
p
le
s
e
n
ti
m
e
n
t
s
A
lthou
g
h th
e
t
e
chnic
a
l r
a
tin
g
is b
a
d,
$
N
AV
do
e
s pr
e
s
e
nt
a
nic
e
s
e
tup
opportunit
y
.
<
U
RL
>
Fi
n-
Li
n
ID
:
2018
-
09
-
18T06
:
42
:
00Z
Co
mp
le
x
la
n
g
u
a
g
e
“$
T
S
L
A
G
o
Te
sl
a
! M
a
k
e
th
e
shorts f
ee
l th
e
burn -
a
w
e
ll th
e
oil comp
a
ni
e
s, th
e
Koch
Broth
e
rs
e
tc., $
G
M $
F
y
ou n
ee
d to
up
y
our
ga
m
e
!”
Fi
n-
Li
n
ID
:
2018
-
07
-
02T14
:
29
:
46Z
P
r
oce
ss
i
n
g
e
m
otio
ns
“@
Te
sl
a
stoc
k
g
oin
g
up.. I so r
eg
r
e
t
s
e
llin
g
m
y
sh
a
r
e
s
a
t $900
#
in
v
e
stin
g
#
stoc
k
s”
Tabo
r
d
a
ID
:
879440
P
r
oce
ss
i
n
g
e
m
otio
ns
“$
T
S
L
A
LM
A
O RIP to thos
e
th
a
t follow
e
d
r
y
a
n brin
k
m
a
n
a
nd bou
g
ht $
G
M $
F
Fi
n-
Li
n
ID
:
2018
-
07
-
25T18
:
45
:
53Z
Mu
lti
p
le
p
e
rsp
ective
s
“R
e
m
e
mb
e
r th
e
turn
?
N
ow st
a
rt pr
e
p
a
rin
g
for th
e
g
r
ea
t
e
st m
a
r
k
e
t cr
a
sh in histor
y
.
I s
a
y
it li
k
e
I s
ee
it. $
T
S
L
A
$
S
P
Y
$
G
IL
D
$
A
BB
V
$P
F
E
$
TE
VA
$
T
D
OC $
V
IX $
V
XX $
U
V
X
Y
$
S
V
X
Y
$
S
PX $
G
OO
G
$
A
M
Z
N
$
F
B
https://t.co/Ohu
kyv
RnIm
Tabo
r
d
a
ID
:
472181
4. Re
s
u
lt
s
4.1.
B
a
se
M
o
d
el
P
e
r
f
o
r
m
a
nce
A
f
te
r
t
h
e t
r
ai
n
i
n
g
o
f
t
h
e Fi
n
S
o
S
e
n
t
m
o
d
el,
w
e
s
ettle
d
o
n
t
h
e
b
e
s
t
-
p
e
rf
o
rm
i
n
g
m
o
d
el,
an
d
w
e
w
ill
c
o
m
p
a
r
e it
a
g
a
i
ns
t
a
lte
r
na
tive
s
e
n
ti
m
e
n
t
ana
lyze
r
s
.
W
e
c
o
m
p
a
r
e
d
t
h
e
p
e
r
-
f
o
r
manc
e o
f
t
h
e Fi
n
S
o
S
e
n
t
m
o
d
el
a
g
a
i
ns
t
a
s
et o
f
s
e
n
ti
m
e
n
t
ana
lyze
r
s
,
w
h
i
ch
c
o
ns
i
s
t
s
o
f
c
o
mm
e
r
c
i
a
l
s
e
n
ti
m
e
n
t
ana
lyze
r
s
,
c
o
mm
e
r
c
i
a
l ge
n
e
r
a
tive AI
m
o
d
el
s
,
aca
d
e
m
i
c
s
e
n
ti
-
m
e
n
t
ana
ly
s
i
s
m
o
d
el
s
,
an
d
o
p
e
n-s
o
ur
c
e
s
e
n
ti
m
e
n
t
ana
lyze
r
s
. I
n
t
h
e
s
e ex
p
e
r
i
m
e
n
t
s
,
w
e
c
o
m
p
a
r
e
d
Fi
n
S
o
S
e
n
t
w
it
h
A
ma
zo
n-
Co
m
pr
e
h
e
n
d
[
62,63
]
, Fi
nB
ERT
[
40
]
, G
P
T
-
3.5
-
T
ur
b
o
16K
[
64
]
, I
BM
W
AT
S
O
N
[
65,66
]
,
S
e
n
ti
S
t
r
e
n
g
t
h
[
67
]
, a
nd
VADER
[
43
]
. T
h
e
r
e
s
u
lts a
r
e
s
h
o
wn
i
n
Table
s
18 a
n
d
19.
F
i
nS
o
S
e
n
t o
u
t
p
e
rf
o
rms
all
m
o
d
e
l
s
i
n
acc
ur
acy, b
u
t
n
ot i
n
t
h
e
w
e
i
g
h
t
e
d
F1
-sc
o
r
e
ac
r
o
ss
a
ll t
h
r
ee te
s
t
d
a
t
as
et
s
. Ove
r
a
ll, eit
h
e
r
Fi
n
S
o
S
e
n
t o
r
VADER
ma
r
gi
na
lly
o
u
t
p
e
r
f
o
r
m
e
ach
ot
h
e
r
.
W
e
a
l
s
o
n
oti
c
e
d
t
ha
t
a
ge
n
e
r
a
tive AI
m
o
d
el like G
P
T 3.5
-
T
ur
b
o
p
e
rf
o
rms
t
h
i
r
d
be
s
t, b
u
t i
s
a
s
ta
n
d
a
r
d
d
eviatio
n
a
w
ay
fr
o
m
VADER a
n
d
Fi
n
S
o
S
e
n
t.
O
ve
r-
a
ll, t
h
e
r
e i
s
n
ot
a
l
a
r
ge
d
i
ff
e
r
e
nc
e
b
et
w
ee
n
s
e
n
ti
m
e
n
t
sc
o
r
e
s
,
an
d
t
h
e
r
e
s
u
lt
s
sh
o
w
t
ha
t
i
mpr
ove
m
e
n
t
s
i
n
t
h
i
s
fiel
d
a
r
e
m
o
s
tly
m
a
r
g
i
n
al
d
u
e to t
h
e co
mp
le
x
ity o
f
t
h
i
s
ta
s
k.
Ta
b
le 18. Acc
u
r
acy o
f
t
h
e Fi
nS
o
S
e
n
t
m
o
d
el co
mp
a
r
e
d
to a
s
et o
f
co
mm
e
r
cial,
g
e
n
e
r
ative AI, aca
d
e
m
ic,
a
n
d
o
p
e
n-s
o
ur
ce
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
m
o
d
el
s
. T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
m
o
d
el.
M
o
d
el
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
A
m
azo
n-
Co
mpr
e
h
e
n
d
Fi
nB
ERT
Fi
nS
o
S
e
n
t
G
P
T
-
3.5
-
T
ur
bo
I
BM
W
AT
S
O
N
S
e
n
ti
S
t
r
e
n
g
t
h
VADER
0.408
0.442
0.536
0.524
0.464
0.418
0.479
0.727
0.696
0.634
0.651
0.634
0.581
0.537
0.446
0.482
0.547
0.474
0.515
0.495
0.664
0.527
0.540
0.572
0.550
0.538
0.498
0.560
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
16 o
f
21
T
a
b
le 19.
W
eig
h
te
d
F1
-sc
o
r
e o
f
t
h
e Fi
n
S
o
S
e
n
t
m
o
d
el
c
o
m
p
a
r
e
d
to
a
s
et o
f
c
o
mm
e
r
c
i
a
l, ge
n
e
r
a
tive
AI, aca
d
e
m
ic, a
n
d
o
p
e
n-s
o
ur
c
e
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
m
o
d
e
l
s
. T
h
e
bol
d
e
d
av
e
r
a
g
e
val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e
o
p
ti
m
al
m
o
d
el.
M
o
d
el
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
A
m
azo
n-
Co
mpr
e
h
e
n
d
Fi
nB
ERT
Fi
nS
o
S
e
n
t
G
P
T
-
3.5
-
T
ur
bo
I
BM
W
AT
S
O
N
S
e
n
ti
S
t
r
e
n
g
t
h
VADER
0.349
0.403
0.512
0.518
0.454
0.396
0.477
0.733
0.634
0.633
0.670
0.652
0.605
0.567
0.382
0.436
0.529
0.443
0.511
0.473
0.661
0.488
0.491
0.558
0.543
0.539
0.492
0.568
4.2. Ensem
b
le
M
o
d
els
P
e
r
f
o
r
m
a
nce
T
h
e
m
e
t
h
o
d
o
f
e
n
s
e
m
bli
n
g
pr
ovi
d
e
d
a
n
a
dd
itio
n
al boost i
n
p
e
rf
o
rm
a
n
c
e
t
hr
o
u
g
h
t
h
e
u
s
e
o
f
t
h
e
so
f
t voti
n
g
a
nd
m
ajo
r
ity voti
n
g
t
e
c
hn
iq
u
e
s.
W
it
h
so
f
t voti
n
g
,
w
e
ca
n
u
s
e
t
h
e
co
m
bi
n
atio
n
o
f
pr
obabili
s
tic
pr
e
d
ictio
ns
fr
o
m
t
h
e
F
i
nS
o
S
e
n
t, VADER, a
nd
I
BM
W
at
s
o
ns
m
o
d
el
s
to c
r
eate a
m
ea
n
to
d
ete
rm
i
n
e t
h
e
s
e
n
ti
m
e
n
t
;
o
n
t
h
e ot
h
e
r
h
a
n
d
,
m
ajo
r
ity voti
n
g
u
s
e
s
t
h
e
c
la
ss
i
f
i
c
atio
n
pr
e
d
ictio
n
m
a
d
e
b
e
t
w
ee
n
t
h
e
m
o
d
e
l
s
, a
nd
t
h
e
m
o
s
t a
g
r
ee
d-up
o
n
lab
e
l
w
ill b
e
c
h
o
s
e
n
,
wh
il
e
i
f
t
h
e
r
e i
s
n
o a
g
r
ee
m
e
n
t,
n
e
u
t
r
al i
s
u
s
e
d
a
s
t
h
e
f
i
n
al
p
r
e
d
i
c
tio
n
.
B
ot
h
te
chn
i
q
u
e
s
e
nh
a
nc
e
d
t
h
e
p
e
rf
o
rm
a
n
c
e
o
f
pr
e
d
icti
n
g
ov
e
r
a
n
y i
nd
ivi
du
al
m
o
d
e
l, b
u
t it
w
a
s
a
m
o
d
e
s
t i
mpr
ov
e
m
e
n
t. T
h
e
p
e
rf
o
rm
a
n
ce o
f
t
h
e
s
e e
ns
e
m
ble
m
o
d
el
s
i
s
sh
o
w
n
i
n
Table
s
20 a
n
d
21.
T
a
b
le 20. A
cc
ur
ac
y o
f
t
h
e e
ns
e
mb
le
m
o
d
el
s
w
it
h
Fi
n
S
o
S
e
n
t, VADER,
an
d
I
BM
Wa
t
s
o
n
as
t
h
e
co
mp
o
n
e
n
t
m
o
d
el
s
. T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
m
o
d
el.
M
o
d
el
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
E
ns
e
m
ble
-M
ajo
r
ityVoti
n
g
E
ns
e
m
ble
-S
o
f
tVoti
n
g
0.478
0.481
0.708
0.689
0.582
0.568
0.589
0.579
Ta
b
le 21.
W
e
i
g
h
te
d
F
1
-
sco
r
e
o
f
t
h
e
e
n
s
e
m
bl
e
m
o
d
e
ls
w
i
t
h
F
i
nS
o
S
e
n
t
, VADER, a
nd
I
BM
W
a
t
so
n
as
t
h
e
co
mp
o
n
e
n
t
m
o
d
el
s
. T
h
e bol
d
e
d
ave
r
a
g
e val
u
e
r
e
pr
e
s
e
n
t
s
t
h
e o
p
ti
m
al
m
o
d
el.
M
o
d
el
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Ave
r
a
g
e
E
ns
e
m
ble
-M
ajo
r
ityVoti
n
g
E
ns
e
m
ble
-S
o
f
tVoti
n
g
0.457
0.468
0.711
0.691
0.567
0.563
0.578
0.574
4.3. Stu
d
y
Limit
a
ti
o
ns
W
e
n
ote a
f
e
w
li
m
itatio
ns
w
it
h
t
h
i
s
s
t
u
d
y. T
h
e fi
n
e
-
t
un
i
n
g
d
ata
s
et
s
r
a
n
g
e i
n
s
ize
fr
o
m
2886 to 10,000
d
o
c
u
m
e
n
t
s
,
w
h
ile t
h
e te
s
ti
n
g
d
ata
s
et
s
r
a
n
g
e
fr
o
m
1300 to 5512
d
o
c
u
m
e
n
t
s
.
T
h
e
s
e
a
r
e
r
e
lativ
e
ly
sm
all
d
ata
s
e
t
s
,
wh
ic
h
co
u
l
d
p
ot
e
n
tially i
mp
act t
h
e
r
e
su
lt
s
o
f
t
h
i
s
s
t
ud
y.
Wh
il
e
a
m
a
nu
al st
e
p-
by
-
st
e
p
a
ppr
oac
h
i
n
t
h
e
m
o
r
e
t
h
a
n
860
e
x
p
e
r
i
m
e
n
ts
w
as
u
s
e
d
to s
e
l
e
ct
t
h
e
o
p
ti
m
al
m
o
d
e
l, ot
h
e
r
a
ppr
oac
h
e
s
su
c
h
a
s
g
r
i
d
s
e
a
r
c
h
o
r
r
a
nd
o
m
s
e
a
r
c
h
co
u
l
d
al
s
o
h
av
e
bee
n
us
e
d
to i
d
e
n
ti
f
y t
h
e o
p
ti
m
al fi
n
e
-
t
un
e
d
m
o
d
el.
5. Co
n
cl
u
s
io
n
s
a
nd
F
u
t
u
r
e
W
o
r
k
T
h
is
w
o
r
k
d
e
v
e
lo
p
e
d
a
m
o
d
e
l call
e
d
F
i
nS
o
S
e
n
t,
wh
ic
h
pr
e
t
r
ai
n
e
d
a
B
ERT
-
bas
e
d
m
o
d
e
l
a
n
d
fi
n
e
-
t
u
n
e
d
it
d
o
w
ns
t
r
ea
m
f
o
r
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
w
it
h
o
u
t
p
r
e
p
r
o
c
e
ss
i
n
g
t
h
e text.
W
e
believe t
h
e
r
e i
s
s
co
p
e
f
o
r
m
o
r
e i
mpr
ove
m
e
n
t i
n
t
hr
ee
d
i
ff
e
r
e
n
t a
r
ea
s
:
t
h
e
us
a
g
e o
f
a
m
o
d
el
w
it
h
l
a
r
ge
r
p
a
r
am
ete
r
s
,
chan
gi
n
g t
h
e
sc
o
p
e o
f
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
,
an
d
a
n
ovel i
d
e
a
f
o
r
p
r
e
p
r
o
c
e
ss
i
n
g
t
h
e i
n
pu
t. Fi
rs
tly, t
r
ai
n
i
n
g
a
m
o
d
el o
n
B
ERT
-
la
r
g
e
w
ill
p
r
ovi
d
e a
n
i
ncr
ea
s
e
i
n
t
h
e
n
u
mb
e
r
o
f
e
nc
o
d
e
r
s
,
b
i
d
i
r
e
c
tio
na
l
s
el
f-a
tte
n
tio
n
h
e
a
d
s
,
an
d
p
a
r
am
ete
r
s
; t
h
i
s
ma
y
pr
ovi
d
e
m
o
r
e
r
o
b
u
s
t
an
d
b
ette
r
u
n
d
e
r
s
t
an
d
i
n
g o
f
t
h
e i
n
pu
t
d
o
c
u
m
e
n
t
s
.
S
e
c
o
n
d
ly,
w
e
w
o
u
l
d
al
s
o like to a
dd
r
e
ss
a
n
i
nh
e
r
ite
d
pr
oble
m
w
it
h
d
oc
um
e
n
t
-
ba
s
e
d
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
a
s
d
i
s
c
uss
e
d
by
B
alaji
e
t al.
[
57
]
a
nd
Hoa
n
g
e
t al.
[
68
]
, by
e
x
p
lo
r
i
n
g
a
sp
e
ct
-
ba
s
e
d
s
e
n
ti
m
e
n
t
a
n
aly
s
i
s
,
w
h
i
ch
allo
w
s
u
s
to a
dd
r
e
ss
t
h
e
ch
alle
n
g
e
s
o
f
m
u
lti
p
le
s
e
n
ti
m
e
n
t
s
b
ei
n
g
p
r
e
s
e
n
t
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
17 o
f
21
i
n
a
d
o
c
u
m
e
n
t. L
as
tly,
w
e
ha
ve ex
p
e
r
i
m
e
n
te
d
w
it
h
a
n
ovel
pr
e
pr
o
c
e
ss
i
n
g
s
te
p
t
ha
t
u
s
e
s
ge
n
e
r
a
tive text
m
o
d
el
s
to i
d
e
n
ti
f
y
c
o
m
p
lex ve
r
b
i
a
ge i
n
s
o
c
i
a
l
m
e
d
i
a
i
n
to
s
o
m
et
h
i
n
g t
ha
t
ca
n
b
e
e
m
b
e
dd
e
d
a
nd
to
k
e
n
s t
h
at ca
n
b
e
r
e
pr
e
s
e
n
t
e
d
w
it
h
i
n
t
h
e
B
ERT vocab
u
la
r
y.
F
i
g
ur
e
4
sh
o
w
s
a
s
o
c
ial
m
e
d
ia
p
o
s
t t
h
at
w
a
s
pr
e
pr
o
c
e
ss
e
d
u
s
i
n
g
t
h
e
n
ovel i
d
ea
m
e
n
tio
n
e
d
ea
r
lie
r
us
i
n
g
G
P
T 3.5
-
t
ur
bo i
n
to a
s
t
ru
ct
ur
e
d
f
o
rm
t
h
at i
s
m
o
r
e
un
d
e
rs
ta
n
d
able.
O
r
i
g
i
n
al te
x
t
:
$
FT
R
H
a
d $775Mil M
k
tC
a
p
a
t close
Y
est. 10%
H
a
s
E
v
a
por
a
ted In 90 minute tod
a
y
.Been w
a
rned
aga
in, but
Know-it-
a
lls
D
re
a
m of B
S
25%
D
i
vy
s.
P
r
e
p
r
oce
ss
e
d
te
x
t
:
$
FT
R h
a
d
a
$775 million m
a
r
k
et c
a
pit
a
liz
a
tion
a
t
the close
y
esterd
a
y
.
A
pproxim
a
tel
y
10% of it h
a
s e
v
a
por
a
ted in
the first 90 minutes tod
a
y
.
D
espite w
a
rnin
g
s, some indi
v
idu
a
ls,
often considered
k
now-it-
a
lls, continue to dre
a
m
a
bout
a
25%
di
v
idend
y
ield, which m
a
y
be specul
a
ti
v
e or misle
a
din
g
.
Fi
gu
r
e 4. U
s
i
n
g
G
P
T 3.5 to
pr
e
pr
oce
ss
a
s
a
mp
le
s
ocial
m
e
d
ia
p
o
s
t.
U
s
i
n
g ge
n
e
r
a
tive AI text
m
o
d
el
s
pr
ovi
d
e
s
a
s
et o
f
b
e
n
efit
s
i
n
a
dd
itio
n
to t
r
a
d
eo
ffs
;
t
h
e
b
e
n
efit
s
i
nc
l
ud
e t
r
ansf
o
r
m
i
n
g t
h
e o
r
igi
na
l text to
b
e
c
le
a
r
e
r
c
o
nc
e
r
n
i
n
g it
s
c
o
n
text
a
n
d
ex
p
a
n
d
i
n
g
a
cr
o
n
y
ms
i
nc
l
ud
i
n
g
ti
c
ke
r
s
y
mb
ol
s
. T
h
i
s
p
r
o
c
e
ss
w
o
u
l
d
b
e
d
i
f
fi
c
u
lt
w
it
h
t
r
a
d
itio
n
al NL
P
pr
e
pr
oce
ss
i
n
g
s
te
ps
a
s
it
usu
ally
m
ake
s
t
h
e i
npu
t
sm
alle
r
a
n
d
,
s
o
m
eti
m
e
s
,
l
e
ss
und
e
r
sta
nd
abl
e
.
B
y
e
n
abli
n
g
t
h
is ca
p
ability,
w
e
l
e
v
e
r
a
g
e
t
h
e
b
e
n
e
fit o
f
u
si
n
g
g
e
n
e
r
ativ
e
AI, b
u
t by
us
i
n
g
t
h
i
s
tec
hn
olo
g
y,
w
e a
r
e
n
o
w
e
x
p
o
s
e
d
to t
h
e
w
eak
n
e
ss
e
s
o
f
t
h
e
s
e
s
y
s
te
ms
.
A
s
d
i
sc
u
ss
e
d
b
y Goe
r
tzel
[
69
]
,
s
el
f-
atte
n
tio
n
h
a
s
li
m
itatio
ns
w
it
h
s
t
r
u
c
t
u
r
e la
n
gu
a
g
e
du
e
to t
h
e fi
n
ite vocab
u
la
r
y a
n
d
s
e
n
te
n
ce
s
t
r
u
ct
u
r
e
s
,
m
aki
n
g
it a liability i
n
NL
P
s
ce
n
a
r
io
s
. I
n
a
dd
itio
n
,
G
o
e
r
tz
e
l
d
i
s
c
uss
e
s
t
h
e
li
m
itatio
n
o
f
LL
Ms
f
o
r
NL
P
us
e
ca
s
e
s
;
G
o
e
r
tz
e
l
m
e
n
tio
n
e
d
t
h
at
g
e
n
e
r
al
m
o
d
el
s
ca
nn
ot o
u
t
p
e
rf
o
rm
fi
n
e
-
t
u
n
e
d
m
o
d
el
s
i
n
NL
P
u
s
e ca
s
e
s
, b
u
t
p
e
rf
o
rm
w
ell
ac
r
o
ss
man
y NL
P
t
as
k
s
. Fi
na
lly, Goe
r
tzel
s
t
a
te
s
t
ha
t LL
Ms
ha
ll
u
c
i
na
te
an
d
l
ac
k
e
p
iso
d
ic li
f
e
h
isto
r
y
f
o
r
t
h
e
tas
k
o
f
pr
oc
e
ssi
n
g
f
o
r
cla
r
i
f
yi
n
g
t
e
x
t
fr
o
m
a
u
s
e
r
o
f
social
m
e
d
ia.
Cl
a
r
i
f
yi
n
g text
f
r
o
m
a
u
s
e
r
o
f
s
o
c
i
a
l
m
e
d
i
a
ma
y e
n
t
a
il
u
n
d
e
r
s
t
an
d
i
n
g
w
ha
t t
h
e
u
s
e
r
i
s
talki
n
g
abo
u
t,
w
h
ic
h
m
ay
sp
a
n
ac
r
o
ss
d
i
ff
e
r
e
n
t
s
ocial
m
e
d
ia
p
o
s
t
s
o
r
eve
n
n
e
w
s
eve
n
t
s
.
T
h
i
s
i
s
a
b
e
n
e
f
it t
ha
t
can
b
e leve
r
a
ge
d
i
n
ma
ki
n
g it
c
le
a
r
e
r
to
pr
e
d
i
c
t
s
e
n
ti
m
e
n
t
b
y
r
e
pr
o
du
c
i
n
g t
h
e o
r
igi
na
l text
an
d
pr
ovi
d
i
n
g
a
dd
itio
na
l
c
o
n
text to
ac
r
o
n
y
ms
,
w
h
i
ch
ma
y
n
ot
b
e
w
ell
u
n
d
e
r
s
too
d
b
y
B
ERT
s
vo
cab
u
l
a
r
y
s
et. T
h
e
cha
lle
n
ge o
f
u
s
i
n
g ge
n
e
r
a
tive AI LL
Ms
i
s
t
h
at t
h
e
s
e i
n
t
r
o
du
ce t
h
e i
n
ability to
r
e
pr
o
du
ce
r
e
su
lt
s
a
nd
h
all
u
ci
n
atio
ns
,
w
h
ic
h
m
ay
m
i
sr
e
pr
e
s
e
n
t ac
r
o
n
y
ms
a
n
d
te
x
t,
u
lti
m
ately c
h
a
n
g
i
n
g
t
h
e o
r
i
g
i
n
al
s
e
n
ti
m
e
n
t. To co
m
bat t
h
i
s
,
w
e
i
n
co
rp
o
r
at
e
d
a s
e
n
t
e
n
c
e
si
m
ila
r
ity sco
r
e
u
si
n
g
t
h
e
cosi
n
e
si
m
ila
r
ity to co
mp
a
r
e
t
h
e
o
r
i
g
i
n
al
text
an
d
t
h
e
pr
e
pr
o
c
e
ss
e
d
text, e
ns
ur
i
n
g t
h
e
q
u
a
lity o
f
t
h
e i
n
teg
r
ity o
f
t
h
e o
r
igi
na
l text; t
h
i
s
i
s
a
w
i
d
ely
u
s
e
d
m
et
r
i
c
f
o
r
c
o
m
p
a
r
i
n
g
t
w
o text
s
a
n
d
t
h
ei
r
s
i
m
ila
r
ity. T
h
i
s
te
chn
i
q
u
e i
s
u
s
e
d
i
n
s
e
a
r
ch
e
n
gi
n
e
s
to
c
o
m
p
a
r
e
a
q
u
e
r
y
w
it
h
t
h
e
s
e
a
r
ch
r
e
s
u
lt
s
; te
chn
i
ca
lly
s
p
e
a
ki
n
g,
c
o
s
i
n
e
si
m
ila
r
ity is co
mp
a
r
i
n
g
t
e
rm
s b
e
t
w
ee
n
d
oc
um
e
n
ts
;
h
o
w
e
v
e
r
, t
h
is
d
o
e
s
n
ot i
n
cl
ud
e
t
h
e
s
e
m
a
n
tic
s
i
m
il
a
r
ity o
f
text,
as
el
ab
o
r
a
te
d
b
y R
ah
u
to
m
o et
a
l.
[
70
]
an
d
Raj
u
et al.
[
71
]
. I
n
T
ab
le 22,
w
e
pr
e
s
e
n
t
a
ze
r
o
-sh
ot
c
o
s
i
n
e
s
i
m
il
a
r
ity
sc
o
r
e
b
et
w
ee
n
t
h
e o
r
igi
na
l
an
d
t
h
e LL
M-
pr
e
pr
o
c
e
ss
e
d
r
e
su
lt
s
fr
o
m
G
P
T 3.5
-
t
ur
bo.
W
e e
x
p
ect to
r
eceive bette
r
co
s
i
n
e
s
i
m
ila
r
ity
r
e
su
lt
s
by c
h
a
n
g
i
n
g
t
h
e
pr
o
mp
t
fr
o
m
ze
r
o
-sh
ot to
f
e
w
-sh
ot
pr
o
mp
ti
n
g
i
n
fu
t
ur
e
w
o
r
k
s
.
Ta
b
le 22. Co
s
i
n
e
s
i
m
ila
r
ity
m
ea
sur
e o
f
t
h
e fi
n
e
-
t
un
i
n
g
d
ata
s
et
s
.
Fi
n
-
Li
n
Sa
nd
e
rs
Ta
b
o
r
d
a
Co
s
i
n
e
S
i
m
ila
r
ity
S
co
r
e
M
ea
n
0.747
0.901
0.847
A
u
t
h
o
r
Co
n
t
r
i
bu
tio
ns
:
Co
n
c
e
p
t
u
aliza
t
io
n
, C.
M
., J.D. a
nd
J.K.
;
M
et
h
o
d
olo
g
y, C.
M
., J.D. a
nd
J.K.
;
I
n
v
e
s
t
i
-
g
atio
n
, J.D.
;
Data c
ur
atio
n
, J.D.
;
Wr
iti
n
g
o
r
i
g
i
n
al
d
r
a
f
t, C.
M
., J.D. a
n
d
J.K.
;
Wr
iti
n
g
—r
evie
w
&
e
d
iti
n
g
,
C.
M
., J.D. a
n
d
J.K. All a
u
t
h
o
rs
h
ave
r
ea
d
a
n
d
a
g
r
ee
d
to t
h
e
pu
bli
sh
e
d
ve
rs
io
n
o
f
t
h
e
m
a
nus
c
r
i
p
t.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
18 o
f
21
F
und
i
ng
:
T
h
i
s
r
e
s
ea
r
c
h
r
eceive
d
n
o e
x
te
rn
al
fun
d
i
n
g
.
D
a
t
a
Av
a
il
a
b
ility St
a
te
m
e
n
t
:
T
h
e
d
a
t
a
pr
e
s
e
n
te
d
i
n
t
h
i
s
s
t
ud
y
a
r
e
a
v
a
il
ab
le o
n
r
e
q
u
e
s
t
f
r
o
m
t
h
e
co
rr
e
sp
o
n
d
i
n
g
a
u
t
h
o
r
.
Co
n
flict
s
of I
n
te
r
e
s
t
:
T
h
e a
u
t
h
o
rs
d
ecla
r
e
n
o co
n
flict
s
o
f
i
n
te
r
e
s
t.
A
bb
r
eviatio
n
s
T
h
e
f
ollo
w
i
n
g
abb
r
eviatio
ns
a
r
e
us
e
d
i
n
t
h
i
s
m
a
nus
c
r
i
p
t
:
A
bb
r
eviatio
n
NL
P
LL
M
B
ERT
Fi
nB
ERT
RNN
CNN
L
S
T
M
EL
M
o
XLNet
UL
M
Fit
G
P
T
F
S
A
F
SM
S
ET5
E
M
H
SS
IX
TRC2
NTU
S
D
-
Fi
n
E
P
S
P
/E
FCF
R
O
E
R
S
I
M
ACD
O
B
V
D/E
P
/
B
FT
S
VADER
ADA
S
YN
SM
O
TE
Defi
n
itio
n
n
at
ur
al la
n
g
u
a
g
e
pr
oce
ss
i
n
g
la
r
g
e la
n
g
u
a
g
e
m
o
d
el
bi
d
i
r
ectio
n
al e
n
co
d
e
r
r
e
pr
e
s
e
n
tatio
ns
fr
o
m
t
r
a
nsf
o
rm
e
rs
B
ERT
m
o
d
el
f
o
r
fi
n
a
n
cial
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
r
ec
urr
e
n
t
n
e
ur
al
n
et
w
o
r
k
co
n
vol
u
tio
n
al
n
e
ur
al
n
et
w
o
r
k
lo
n
g
sh
o
r
t
-
te
rm
m
e
m
o
r
y
e
m
be
dd
i
n
g
s
fr
o
m
la
n
g
u
a
g
e
m
o
d
el
eXt
r
e
m
e
mu
lti
-
label te
x
t cla
ss
ificatio
n
un
ive
rs
al la
n
g
u
a
g
e
m
o
d
el fi
n
e
-
t
un
i
n
g
g
e
n
e
r
ative
pr
et
r
ai
n
e
d
t
r
a
nsf
o
rm
e
rs
fi
n
a
n
cial
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
Fi
n-S
o
M
e
S
e
m
Eval
-
2017 Ta
s
k 5
e
f
ficie
n
t
m
a
r
ket
h
y
p
ot
h
e
s
i
s
s
ocial
s
e
n
ti
m
e
n
t i
n
d
ice
s
p
o
w
e
r
e
d
by X
-s
co
r
e
s
T
h
o
ms
o
n
Re
u
te
rs
Te
x
t Re
s
ea
r
c
h
Collectio
n
Natio
n
al Tai
w
a
n
U
n
ive
rs
ity
s
ocial
m
e
d
ia
d
ata
s
et fi
n
a
n
cial
ea
rn
i
n
g
s
p
e
r
sh
a
r
e
pr
ice
-
to
-
ea
rn
i
n
g
r
atio
fr
ee ca
sh
flo
w
r
et
urn
o
n
e
qu
ity
r
elative
s
t
r
e
n
g
t
h
i
n
d
e
x
m
ovi
n
g
ave
r
a
g
e co
n
ve
r
g
e
n
ce
o
n-
bala
n
ce vol
um
e
d
ebt
-
to
-
e
qu
ity
r
atio
pr
ice
-
to
-
book
r
atio
fu
ll
-
te
x
t
s
ea
r
c
h
Vale
n
ce A
w
a
r
e Dictio
n
a
r
y a
n
d
s
E
n
ti
m
e
n
t Rea
s
o
n
e
r
a
d
a
p
tive
s
y
n
t
h
etic
s
a
mp
li
n
g
s
y
n
t
h
etic
m
i
n
o
r
ity ove
r-s
a
mp
li
n
g
tec
hn
i
qu
e
Refe
r
e
n
ce
s
1.
F
i
n
a
n
cial T
e
rms
Dictio
n
a
r
y. I
n
v
e
s
to
p
e
d
ia. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
www
.i
n
v
e
s
to
p
e
d
ia.co
m
/fi
n
a
n
cial
-
t
e
rm-
d
ictio
n
a
r
y
-
4769738
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
2.Fa
m
a, E.F. Ra
n
d
o
m
W
alk
s
i
n
S
tock
M
a
r
ket
P
r
ice
s
. Fin
a
nc. An
a
l. J. 1965, 21, 55–59.
[
C
r
o
ss
Re
f]
3.Fa
m
a, E.F. E
f
ficie
n
t Ca
p
ital
M
a
r
ket
s
:
A Revie
w
o
f
T
h
eo
r
y a
n
d
E
mp
i
r
ical
W
o
r
k. J. Fin
a
nc. 1970, 25, 383–417.
[
C
r
o
ss
Re
f]
4.T
w
itte
r
, I
n
c. Available o
n
li
n
e
:
h
tt
ps
:
//t
w
itte
r
.co
m
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
5.
S
tockT
w
it
s
, I
n
c. Available o
n
li
n
e
:
h
tt
ps
:
//
s
tockt
w
it
s
.co
m
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
6.
W
a
n
g
, G.
;
W
a
n
g
, T.
;
W
a
n
g
,
B
.
;
S
a
mb
a
s
iva
n
, D.
;
Z
h
a
n
g
, Z.
;
Z
h
e
n
g
, H.
;
Z
h
ao,
B
.Y. C
r
o
wd
s
o
n
W
all
S
t
r
eet
:
Ext
r
a
c
ti
n
g
val
u
e
fr
o
m
c
olla
b
o
r
ative i
n
ve
s
ti
n
g
p
lat
f
o
r
ms
. I
n
Pr
o
c
ee
d
i
n
g
s
o
f
t
h
e 18t
h
AC
M
Co
nf
e
r
e
nc
e o
n
Co
m
pu
te
r
Supp
o
r
te
d
Coo
p
e
r
ative
W
o
r
k
&
S
ocial Co
mpu
ti
n
g
, Va
n
co
u
ve
r
,
B
C, Ca
n
a
d
a, 14–18
M
a
r
c
h
2015
;
pp
. 17–30.
7.
S
o
han
gi
r
,
S
.
;
P
etty, N.
;
Wan
g, D. Fi
nanc
i
a
l
s
e
n
ti
m
e
n
t lexi
c
o
n
ana
ly
s
i
s
. I
n
Pr
o
c
ee
d
i
n
g
s
o
f
t
h
e IEEE 12t
h
IEEE I
n
te
r
na
tio
na
l
Co
nf
e
r
e
n
ce o
n
S
e
m
a
n
tic Co
mpu
ti
n
g
(
IC
S
C
)
, La
g
un
a Hill
s
, CA, U
S
A, 12 A
pr
il 2018
;
pp
. 286–289.
8.
S
o
h
a
n
g
i
r
,
S
.
;
W
a
n
g
, D.
;
P
o
m
e
r
a
n
et
s
, A.
;
K
h
o
sh
g
o
f
taa
r
, T.
M
.
B
i
g
d
ata
:
Dee
p
lea
rn
i
n
g
f
o
r
fi
n
a
n
cial
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
. J.
B
i
g
D
a
t
a
2018, 5, 3.
[
C
r
o
ss
Re
f]
9.Z
h
a
n
g
, L.
;
W
a
n
g
,
S
.
;
Li
u
,
B
. D
ee
p
l
e
a
rn
i
n
g
f
o
r
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
:
A
sur
v
e
y. Wile
y
Inte
r
d
is
c
i
p
. Re
v
. D
a
t
a
M
in. Kn
o
wl. Dis
c
o
v
. 2018, 8,
e1253.
[
C
r
o
ss
Re
f]
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
19 o
f
21
10.Z
h
ao, L.
;
Li, L.
;
Z
h
e
n
g
, X. A
B
ERT
B
as
e
d
S
e
n
t
i
m
e
n
t
A
n
alysis a
nd
K
e
y E
n
t
i
t
y D
ete
c
t
io
n
A
ppr
oac
h
f
o
r
O
n
li
n
e
F
i
n
a
n
cial T
e
x
t
s.
a
r
X
i
v
2020, a
r
Xiv
:
2001.05326.
[
C
r
o
ss
Re
f]
11.C
u
i, X.
;
L
am
, D.
;
Ve
r
ma
, A. E
mb
e
dd
e
d
V
a
l
u
e i
n
B
loo
mb
e
r
g Ne
w
s
an
d
S
o
c
i
a
l
S
e
n
ti
m
e
n
t D
a
t
a
;
B
loo
mb
e
r
g, Te
chn
i
ca
l Re
p
o
r
t.
2016. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
d
e
v
e
lo
p
e
r
.
t
w
i
tte
r
.co
m
/co
n
te
n
t
/
d
a
m
/
d
e
v
e
lo
p
e
r-
t
w
i
tte
r
/
pdfs-
a
nd-
fil
e
s
/
B
loo
m
b
e
r
g
-
T
w
i
tte
r-
Da
t
a
-
Re
s
ea
r
c
h-
Re
p
o
r
t.
p
d
f
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
12.Tetlo
c
k, T.C. Givi
n
g Co
n
te
n
t to I
n
ve
s
to
r
S
e
n
ti
m
e
n
t
:
T
h
e Role o
f
M
e
d
i
a
i
n
t
h
e
S
to
c
k
Ma
r
ket.
J
. Fi
n
a
n
c. 2007, 62, 1139–1168.
[
C
r
o
ss
Re
f]
13. Tetlock,
P
.C.
;
S
aa
r-
T
s
ec
h
a
ns
ky,
M
.
;
M
ac
s
ka
ss
y,
S
.
M
o
r
e T
h
a
n
W
o
r
d
s
:
Q
u
a
n
ti
f
yi
n
g
La
n
g
u
a
g
e to
M
ea
sur
e Fi
rms’
F
un
d
a
m
e
n
tal
s
. J.
Fin
a
nc. 2008, 63, 1437–1467.
[
C
r
o
ss
Re
f]
14.D
e
l
g
a
d
illo, J.
;
Ki
n
y
u
a, J. D.
;
Mu
ti
gw
e
, C. A
B
ERT
-
ba
s
e
d
M
o
d
e
l
f
o
r
F
i
n
a
n
cial
S
ocial
M
e
d
ia
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
. I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e I
n
te
rn
atio
n
al Co
nf
e
r
e
n
ce o
n
A
pp
licatio
ns
o
f
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
(
ICA
S
A 2022
)
, Cai
r
o, E
g
y
p
t, 15–16 Dece
m
be
r
2022.
15.Zi
mb
r
a
, D.
;
A
bbas
i, A.
;
Ze
n
g, D.
;
C
h
e
n
, H. T
h
e
S
t
a
te
-
o
f-
t
h
e
-
A
r
t i
n
T
w
itte
r
S
e
n
ti
m
e
n
t A
na
ly
s
i
s
:
A Revie
w
an
d
B
e
nchma
r
k
Eval
u
atio
n
. AC
M
T
r
a
ns.
M
a
n
a
g. In
f
. S
y
st. 2018, 9, 1–29.
[
C
r
o
ss
Re
f]
16.
Sun
, C.
;
Shr
iva
s
t
ava, A.
;
S
i
n
g
h
,
S
.
;
G
up
t
a, A. R
e
vi
s
i
t
i
n
g
U
nr
e
a
s
o
n
abl
e
E
ff
e
c
t
iv
e
n
e
ss
o
f
Da
t
a i
n
D
ee
p
L
e
a
rn
i
n
g
E
r
a. I
n
Pr
oc
ee
d
i
n
g
s
o
f
t
h
e 2017 IEEE I
n
te
rn
atio
n
al Co
nf
e
r
e
n
ce o
n
Co
mpu
te
r
Vi
s
io
n
, Ve
n
ice, Italy, 22–29
O
ctobe
r
2017
;
pp
. 843–852.
17.D
e
vli
n
, J.
;
C
h
a
n
g
,
M
.
-W
.
;
L
ee
, K.
;
To
u
ta
n
ova, K.
B
ERT
:
P
r
e
t
r
ai
n
i
n
g
o
f
D
ee
p
B
i
d
i
r
e
ctio
n
al T
r
a
nsf
o
rm
e
rs
f
o
r
La
n
g
u
a
g
e
U
nd
e
rs
ta
nd-
i
n
g.
I
n
Pr
o
c
ee
d
i
n
g
s
o
f
t
h
e 2019 Co
nf
e
r
e
nc
e o
f
t
h
e No
r
t
h
A
m
e
r
i
can
C
ha
p
te
r
o
f
t
h
e A
ss
o
c
i
a
tio
n
f
o
r
Co
m
pu
t
a
tio
na
l Li
n
g
u
i
s
ti
cs
:
H
um
a
n
La
n
g
u
a
g
e Tec
hn
olo
g
ie
s
,
M
i
nn
ea
p
oli
s
,
M
N, U
S
A, 2–7 J
un
e 2019
;
Vol
um
e 1
(
Lo
n
g
a
n
d
Sh
o
r
t
P
a
p
e
rs)
,
pp
. 4171–4186.
18.Ho
w
a
r
d
, J.
;
R
u
d
e
r
,
S
. U
n
iv
e
rs
al La
n
g
u
a
g
e
M
o
d
e
l
F
i
n
e
-
t
un
i
n
g
f
o
r
T
e
x
t Cla
ss
ificatio
n
. I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e
56t
h
A
nnu
al
M
ee
ti
n
g
o
f
t
h
e A
ss
ociatio
n
f
o
r
Co
mpu
tatio
n
al Li
n
g
u
i
s
tic
s
,
M
elbo
urn
e, A
us
t
r
alia, 15–20 J
u
ly 2018
;
Vol
um
e 1
:
Lo
n
g
P
a
p
e
rs
,
pp
. 328–339.
19.
P
ete
r
s,
M
. E.
;
N
e
um
a
nn
,
M
.
;
Iyy
e
r
,
M
.
;
G
a
rdn
e
r
,
M
.
;
Cla
r
k
, C.
;
L
ee
, K.
;
Z
ett
l
e
m
oy
e
r
, L. D
ee
p
Co
n
te
x
t
u
aliz
e
d
W
o
rd
R
e
pr
e
s
e
n
t
a
t
io
n
s.
I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e
2018 Co
nf
e
r
e
n
c
e
o
f
t
h
e
No
r
t
h
A
m
e
r
ica
n
C
h
a
p
te
r
o
f
t
h
e
A
ss
ocia
t
io
n
f
o
r
Co
mpu
t
a
t
io
n
al Li
n
g
u
i
s
t
ic
s
:
H
um
a
n
La
n
g
u
a
g
e Tec
hn
olo
g
ie
s
, Ne
w
O
r
lea
ns
, LA, U
S
A, 1–6 J
un
e 2018
;
Vol
um
e 1
(
Lo
n
g
P
a
p
e
rs)
,
pp
. 2227–2237.
20. Ra
df
o
rd
, A.
;
Wu
, J.
;
C
h
il
d
, R.
;
L
u
a
n
, D.
;
A
m
o
d
e
i, D.
;
Su
t
s
k
e
v
e
r
, I. La
n
g
u
a
g
e
m
o
d
e
l
s
a
r
e
unsup
e
r
vi
s
e
d
mu
ltita
s
k l
e
a
rn
e
rs
.
O
p
enAI
B
l
o
g 2019, 1, 8.
21.
B
e
l
t
a
g
y, I.
;
Lo, K.
;
Co
h
a
n
, A.
S
ci
B
ERT
:
A
Pr
et
r
ai
n
e
d
La
n
g
u
a
g
e
M
o
d
e
l
f
o
r
S
ci
e
n
t
ific T
e
x
t
. I
n
Pr
oc
ee
d
i
n
g
s
o
f
t
h
e
2019 Co
nf
e
r
e
n
c
e
o
n
E
mp
i
r
ical
M
e
t
h
o
d
s
i
n
Nat
ur
al La
n
g
u
a
g
e
P
r
oc
e
ss
i
n
g
a
n
d
t
h
e
9t
h
I
n
t
e
rn
atio
n
al Joi
n
t Co
nf
e
r
e
n
c
e
o
n
Nat
ur
al La
n
g
u
a
g
e
P
r
oc
e
ss
i
n
g
(
E
M
NL
P
-
IJCNL
P
)
, Ho
n
g
Ko
n
g
, C
h
i
n
a, 3–7 Nove
m
be
r
2019
;
pp
. 3615–3620.
22.L
ee
, J.
;
Yoo
n
,
W
.
;
Ki
m
,
S
.
;
Ki
m
, D.
;
Ki
m
,
S
.
;
S
o, C. H.
;
Ka
n
g
, J.
B
io
B
ERT
:
A
pr
e
t
r
ai
n
e
d
bio
m
e
d
ical la
n
g
u
a
g
e
r
e
pr
e
s
e
n
tatio
n
m
o
d
e
l
f
o
r
bio
m
e
d
ical te
x
t
m
i
n
i
n
g
.
B
i
o
in
f
o
r
m
a
tics 2019, 36, 1234–1240.
[
C
r
o
ss
Re
f]
[
P
u
b
M
e
d
]
23.H
u
an
g, K.
;
Alto
saa
r
,
J
.
;
R
an
g
ana
t
h
, R. Cli
n
i
ca
l
B
ERT
:
M
o
d
eli
n
g
c
li
n
i
ca
l
n
ote
s
an
d
pr
e
d
i
c
ti
n
g
h
o
s
p
it
a
l
r
e
a
d
m
i
ss
io
n
.
a
r
Xi
v
2019,
a
r
Xiv
:
1904.05342.
24.A
g
aia
n
,
S
.
;
Kol
m
,
P
. Fi
n
a
n
cial
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
us
i
n
g
m
ac
h
i
n
e
l
e
a
rn
i
n
g
t
e
c
hn
i
qu
e
s
. Int. J. In
v
est.
M
a
n
a
g. Fin
a
n
c
. Inn
o
v
. 2017, 3, 1–9.
25.
M
a
n
, X.
;
L
u
o, T.
;
Li
n
, J. Fi
n
a
n
cial
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
(
F
S
A
)
:
A
Su
r
vey. I
n
P
r
ocee
d
i
n
g
s
o
f
t
h
e IEEE I
n
te
rn
atio
n
al Co
nf
e
r
e
n
ce o
n
I
n
d
us
t
r
ial Cybe
r
P
h
y
s
ical
S
y
s
te
ms
(
IC
P
S)
, Tai
p
ei, Tai
w
a
n
, 6–9
M
ay 2019
;
pp
. 617–622.
26.Y
an
g,
S
.
;
Ro
s
e
nf
el
d
,
J
.
;
Ma
k
u
to
n
i
n
,
J
. Fi
nanc
i
a
l
as
p
e
c
t
-bas
e
d
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
u
s
i
n
g
d
ee
p
r
e
pr
e
s
e
n
t
a
tio
ns
.
a
r
Xi
v
2018,
a
r
Xiv
:
1808.07931. Available o
n
li
n
e
:
h
tt
p
:
//a
r
x
iv.o
r
g
/ab
s
/1808.07931
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
27.Li
u
, Y.
;
Ott,
M
.
;
Goy
a
l, N.
;
D
u
,
J
.
;
J
o
sh
i,
M
.
;
C
h
e
n
, D.
;
Levy, O.
;
Le
w
i
s
,
M
.
;
Zettle
m
oye
r
, L.
;
S
toy
an
ov, V. Ro
B
ERT
a
:
A Ro
b
u
s
tly
O
p
ti
m
ize
d
B
ERT
Pr
et
r
a
i
n
i
n
g A
ppr
o
ach
.
a
r
Xi
v
2019,
a
r
Xiv
:
1907.11692. Av
a
il
ab
le o
n
li
n
e
:
h
tt
p
:
//
a
r
xiv.o
r
g/
abs
/1907.11692
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
28.A
r
ac
i, D. Fi
nB
ERT
:
Fi
nanc
i
a
l
S
e
n
ti
m
e
n
t A
na
ly
s
i
s
w
it
h
Pr
et
r
a
i
n
e
d
L
an
g
u
a
ge
M
o
d
el
s
.
a
r
Xi
v
2019,
a
r
Xiv
:
1908.10063. Av
a
il
ab
le
o
n
li
n
e
:
h
tt
p
:
//a
r
x
iv.o
r
g
/ab
s
/1908.10063
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
29. A
r
aci, D.T.
;
Z
u
l
k
uf
G
e
n
c, Z.
F
i
nB
ERT
:
F
i
n
a
n
cial
S
e
n
t
i
m
e
n
t
A
n
alysis
w
i
t
h
B
ERT.
Pr
os
u
s AI T
e
c
h
B
lo
g
. 2020. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
m
e
d
i
um
.co
m
/
pr
o
sus-
ai
-
tec
h-
blo
g
/fi
n
be
r
t
-
fi
n
a
n
cial
-s
e
n
ti
m
e
n
t
-
a
n
aly
s
i
s-
w
it
h-
be
r
t
-
b277a3607101
(
acce
ss
e
d
o
n
1 J
u
ly 2022
)
.
30.R
e
u
t
e
rs
Co
rp
o
r
a
(
RCV1, RCV2, TRC2
)
. Natio
n
al I
ns
tit
u
t
e
o
f
S
ta
nd
a
rds
a
nd
T
e
c
hn
olo
g
y. 2004. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//t
r
e
c.
n
i
s
t.
g
ov/
d
ata/
r
e
u
te
rs
/
r
e
u
te
rs
.
h
t
m
l
(
acce
ss
e
d
o
n
6 A
pr
il 2023
)
.
31.
M
alo,
P
.
;
S
i
nh
a, A.
;
Ko
r
h
o
n
e
n
,
P
.
;
W
alle
n
i
u
s
,
J
.
;
Takala,
P
. Goo
d
d
e
b
t o
r
b
a
d
d
e
b
t
:
Dete
c
ti
n
g
s
e
m
a
n
ti
c
o
r
ie
n
tatio
ns
i
n
e
c
o
n
o
m
i
c
te
x
t
s
. J. Ass
o
c. In
f
. Sci. Tec
h
n
o
l. 2014, 65, 782–796.
[
C
r
o
ss
Re
f]
32.De
s
ola, V.
;
Ha
nn
a, K.
;
No
n
i
s
,
P
. Fi
nB
ERT
:
P
r
et
r
a
i
n
ed
M
o
del
o
n
SEC Fili
ng
s
f
o
r
Fi
n
a
n
ci
a
l N
a
t
ur
a
l L
a
ngu
a
g
e T
a
sks
;
Te
chn
i
c
al Re
p
o
r
t
;
U
n
ive
rs
ity o
f
Cali
f
o
rn
ia
:
Lo
s
A
n
g
ele
s
, CA, U
S
A, 2019.
33.Li
u
, Z.
;
H
u
a
n
g
, D.
;
H
u
a
n
g
, K.
;
Li, Z.
;
Z
h
ao, J.
F
i
nB
ERT
:
A
Pr
et
r
ai
n
e
d
F
i
n
a
n
cial La
n
g
u
a
g
e
R
e
pr
e
s
e
n
t
a
t
io
n
M
o
d
e
l
f
o
r
F
i
n
a
n
cial T
e
x
t
M
i
n
i
n
g. I
n
Pr
o
c
ee
d
i
n
g
s
o
f
t
h
e T
w
e
n
ty
-
Ni
n
t
h
I
n
te
r
na
tio
na
l
J
oi
n
t Co
nf
e
r
e
nc
e o
n
A
r
tifi
c
i
a
l I
n
tellige
nc
e
(
I
J
CAI
-
20
)
, Vi
r
t
u
a
l, 7–15
Ja
nu
a
r
y 2021
;
pp
. 4513–4519.
34.Co
mm
o
n
C
r
a
w
l. Available o
n
li
n
e
:
h
tt
ps
:
//co
mm
o
n
c
r
a
w
l.o
r
g
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
35.Fi
n
a
n
cial
W
eb. Available o
n
li
n
e
:
h
tt
ps
:
//
www
.fi
n
w
eb.co
m
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
. 36.
Ya
h
oo
!
Fi
n
a
n
ce. Available o
n
li
n
e
:
h
tt
ps
:
//fi
n
a
n
ce.ya
h
oo.co
m
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
37.Re
dd
it. Available o
n
li
n
e
:
h
tt
ps
:
//
www
.
r
e
dd
it.co
m
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
38.Fi
n
a
n
cial
O
p
i
n
io
n
M
i
n
i
n
g
a
n
d
Q
u
e
s
tio
n
A
ns
w
e
r
i
n
g
. 2017. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
s
ite
s
.
g
oo
g
le.co
m
/vie
w
/fi
q
a/
(
acc
e
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
20 o
f
21
39.T
h
e Fi
r
s
t
W
o
r
k
sh
o
p
o
n
Fi
nanc
i
a
l Te
chn
ology
an
d
N
a
t
ur
a
l L
an
g
u
a
ge
Pr
o
c
e
ss
i
n
g
(
Fi
n
NL
P
)
w
it
h
a
S
ha
r
e
d
T
as
k
f
o
r
S
e
n
te
nc
e
B
o
und
a
r
y D
ete
c
t
io
n
i
n
P
D
F
Noi
s
y T
e
x
t
i
n
t
h
e
F
i
n
a
n
cial Do
m
ai
n
(
F
i
nSB
D
)
.
[n
.
d
.
]
. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
s
it
e
s
.
g
oo
g
l
e
.co
m
/
n
l
g
.
c
s
ie.
n
t
u
.e
d
u
.t
w
/fi
nn
l
p
/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
40.Y
an
g, Y.
;
UY,
M
.C.
S
.
;
H
u
an
g, A. Fi
nB
ERT
:
A
Pr
et
r
a
i
n
e
d
L
an
g
u
a
ge
M
o
d
el
f
o
r
Fi
nanc
i
a
l Co
mm
u
n
i
ca
tio
ns
.
a
r
Xi
v
2020,
a
r
Xiv
:
2006.08097. Available o
n
li
n
e
:
h
tt
ps
:
//a
r
x
iv.o
r
g
/ab
s
/2006.08097
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
41.H
u
a
n
g
, A.H.
;
Za
n
g
, A.Y.
;
Z
h
e
n
g
, R. Evi
d
e
nc
e o
n
t
h
e I
nf
o
rm
atio
n
Co
n
te
n
t o
f
Text i
n
A
n
aly
s
t Re
p
o
r
t
s
. Acc
o
un
t. Re
v
. 2014, 89, 6,
2151–2180.
[
C
r
o
ss
Re
f]
42.
W
ilk
sch
,
M
.
;
A
br
a
m
ova, O.
P
yFi
n-s
e
n
ti
m
e
n
t
:
To
w
a
r
d
s
a
m
a
ch
i
n
e
-
lea
rn
i
n
g
-b
a
s
e
d
m
o
d
el
f
o
r
d
e
r
ivi
n
g
s
e
n
ti
m
e
n
t
fr
o
m
fi
n
a
nc
ial
t
w
eet
s
. Int. J. In
f
.
M
a
n
a
g. D
a
t
a
Insig
h
ts 2023, 3, 1, 100171.
[
C
r
o
ss
Re
f]
43.H
u
tto, C.
;
G
ilb
e
r
t, E. VADER
:
A
P
a
rs
i
m
o
n
io
us
R
u
l
e
B
a
s
e
d
M
o
d
e
l
f
o
r
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
o
f
S
ocial
M
e
d
ia T
e
x
t. I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e I
n
te
rn
atio
n
al AAAI Co
nf
e
r
e
n
ce o
n
W
eb a
n
d
S
ocial
M
e
d
ia, A
nn
A
r
bo
r
,
M
I, U
S
A, 1–4 J
un
e 2014
;
pp
. 216–225.
44.C
h
e
n
, C.
-
C.
;
H
u
a
n
g
, H.
-
H.
;
C
h
e
n
, H.
-
H. NT
U
S
D
-
F
i
n
:
A
m
a
r
k
e
t
s
e
n
ti
m
e
n
t
d
ictio
n
a
r
y
f
o
r
fi
n
a
n
cial
s
ocial
m
e
d
ia
d
ata a
pp
licatio
ns
.
I
n
P
r
ocee
d
i
n
g
s
o
f
t
h
e 1
s
t Fi
n
a
n
cial Na
rr
ative
P
r
oce
ss
i
n
g
W
o
r
k
sh
o
p
(
FN
P
2018
)
,
M
iyazaki, Ja
p
a
n
, 7–12
M
ay 2018
;
pp
. 37–43.
45.Ya
n
g
, Z.
;
Dai, Z.
;
Ya
n
g
, Y.
;
Ca
r
bo
n
e
ll, J.
;
S
ala
k
hu
t
d
i
n
ov, R.
;
L
ee
,
Q
.V. XLN
et:
G
e
n
e
r
aliz
e
d
A
u
t
o
r
e
g
r
e
ssiv
e
Pr
et
r
ai
n
i
n
g
f
o
r
La
n
g
u
a
g
e
U
n
d
e
rs
ta
n
d
i
n
g
. I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e
33
r
d
I
n
t
e
rn
atio
n
al Co
nf
e
r
e
n
c
e
o
n
N
e
ur
al I
nf
o
rm
atio
n
P
r
oc
e
ss
i
n
g
S
y
s
t
e
ms
(
N
e
ur
I
P
S
2019
)
,
Va
n
co
u
ve
r
,
B
C, Ca
n
a
d
a, 8–14 Dece
m
be
r
2019
;
pp
. 5753–5763.
46.L
an
, Z.
;
C
h
e
n
,
M
.
;
Goo
d
man
,
S
.
;
Gi
m
p
el, K.
;
S
ha
r
ma
,
P
.
;
S
o
r
i
c
u
t, R. AL
B
ERT
:
A Lite
B
ERT
f
o
r
S
el
f-
Sup
e
r
vi
s
e
d
Le
a
r
n
i
n
g o
f
L
an
g
u
a
ge Re
pr
e
s
e
n
t
a
tio
ns
.
a
r
Xi
v
2019,
a
r
Xiv
:
1909.11942. Av
a
il
ab
le o
n
li
n
e
:
h
tt
p
:
//
a
r
xiv.o
r
g/
abs
/1909.11942
(acc
e
ss
e
d
o
n
30
Nove
m
be
r
2021
)
.
47.
S
a
nh
, V.
;
De
b
u
t, L.
;
C
h
a
u
m
o
n
d
, J.
;
W
ol
f
, T. Di
s
til
B
ERT, a Di
s
tille
d
Ve
rs
io
n
o
f
B
ERT
:
S
m
alle
r
, Fa
s
te
r
, C
h
ea
p
e
r
a
n
d
Li
g
h
te
r
.
a
r
Xi
v
2019, a
r
Xiv
:
1910.01108. Available o
n
li
n
e
:
h
tt
p
:
//a
r
x
iv.o
r
g
/ab
s
/1910.01108
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
48.L
e
w
is,
M
.
;
Li
u
, Y.
;
G
oyal, N.
;
G
h
azvi
n
i
n
ej
a
d
,
M
.
;
M
o
h
a
m
e
d
, A.
;
L
e
vy,
O
.
;
S
t
oya
n
ov, V.
;
Z
ett
l
e
m
oy
e
r
, L.
B
ART
:
D
e
n
oisi
n
g
S
e
q
u
e
n
c
e
-
to
-
S
e
q
u
e
nc
e
Pr
et
r
a
i
n
i
n
g
f
o
r
N
a
t
ur
a
l L
an
g
u
a
ge Ge
n
e
r
a
tio
n
, T
r
ans
l
a
tio
n
,
an
d
Co
m
pr
e
h
e
ns
io
n
.
a
r
Xi
v
2019,
a
r
Xiv
:
1910.13461.
Available o
n
li
n
e
:
h
tt
p
:
//a
r
x
iv.o
r
g
/ab
s
/1910.13461
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
49.
M
i
sh
ev, K.
;
Gjo
r
gjevikj, A.
;
Vo
d
e
ns
k
a
, I.
;
C
h
itk
u
sh
ev, L.T.
;
T
r
a
j
an
ov, D. Ev
a
l
u
a
tio
n
o
f
S
e
n
ti
m
e
n
t A
na
ly
s
i
s
i
n
Fi
nanc
e
:
F
r
o
m
Le
x
ico
ns
to T
r
a
nsf
o
rm
e
rs
. IEEE Access 2020, 8, 131662–131682.
[
C
r
o
ss
Re
f]
50.
Ba
r
t
u
n
ov, O.
;
S
ig
a
ev, T. F
u
ll
-
Te
x
t Se
a
r
ch i
n
P
o
st
gr
eS
Q
L
G
e
n
tle I
n
t
r
o
d
u
cti
o
n
;
Te
chn
i
ca
l Re
p
o
r
t
;
M
o
sc
o
w
U
n
ive
r
s
ity
:
M
o
sc
o
w
,
R
uss
ia, 2007.
51.
G
aillat, T.
;
Za
rr
o
u
k,
M
.
;
F
r
e
ita
s
, A.
;
Davi
s
,
B
. T
h
e
SS
IX Co
rp
o
r
a
:
T
hr
ee
G
ol
d
S
ta
nd
a
rd
Co
rp
o
r
a
f
o
r
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
i
n
E
n
g
li
sh
,
Sp
a
n
i
sh
a
nd
G
e
rm
a
n
F
i
n
a
n
cial
M
ic
r
oblo
g
s
. I
n
Pr
oc
ee
d
i
n
g
s
o
f
t
h
e
El
e
v
e
n
t
h
I
n
te
rn
a
t
io
n
al Co
nf
e
r
e
n
c
e
o
n
La
n
g
u
a
g
e
R
e
s
o
ur
c
e
s
a
nd
Eval
u
atio
n
(
LREC 2018
)
,
M
iyazaki, Ja
p
a
n
;
7–12
M
ay 2018
;
pp
. 2671–2675.
52.C
h
e
n
, C.
-
C.
;
H
u
an
g, H.
-
H.
;
C
h
e
n
, H.
-
H. I
ss
u
e
s
an
d
P
e
r
s
p
e
c
tive
s
f
r
o
m
10,000 A
nn
ot
a
te
d
Fi
nanc
i
a
l
S
o
c
i
a
l
M
e
d
i
a
D
a
t
a
. I
n
P
r
ocee
d
i
n
g
s
o
f
t
h
e 12t
h
La
n
g
u
a
g
e Re
s
o
ur
ce
s
a
n
d
Eval
u
atio
n
Co
nf
e
r
e
n
ce,
M
a
rs
eille, F
r
a
n
ce, 11–16
M
ay 2020
;
pp
. 6106–6110.
53.
S
e
m
Eval
-
2017 Ta
s
k 5
:
Fi
n
e
-
G
r
ai
n
e
d
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
o
n
Fi
n
a
nc
ial
M
i
cr
o
b
lo
g
s
a
n
d
Ne
w
s
. Availa
b
le o
n
li
n
e
:
h
tt
p
s
:
//
a
lt.
qc
r
i.
o
r
g
/
s
e
m
eval2017/ta
s
k5/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
54.D
a
ud
e
r
t, T. A
M
u
lti
-
S
o
ur
c
e E
n
tity
-
Level
S
e
n
ti
m
e
n
t Co
rpu
s
f
o
r
t
h
e Fi
nanc
i
a
l Do
ma
i
n
:
T
h
e Fi
n-
Li
n
Co
rpu
s
.
a
r
Xi
v
2020,
a
r
Xiv
:
2003.04073. Available o
n
li
n
e
:
h
tt
p
:
//a
r
x
iv.o
r
g
/ab
s
/2003.04073
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
55.
S
a
i
f
, H.
;
Fe
r
nán
d
ez,
M
.
;
He, Y.
;
Al
an
i, H. Ev
a
l
u
a
tio
n
d
a
t
as
et
s
f
o
r
T
w
itte
r
s
e
n
ti
m
e
n
t
ana
ly
s
i
s
:
a
s
ur
vey
an
d
a
n
e
w
d
a
t
as
et, t
h
e
S
T
S
-
Gol
d
. I
n
Pr
o
c
ee
d
i
n
g
s
o
f
t
h
e 1
s
t I
n
te
r
na
tio
na
l
W
o
r
k
sh
o
p
o
n
E
m
otio
n
an
d
S
e
n
ti
m
e
n
t i
n
S
o
c
i
a
l
an
d
Ex
pr
e
ss
ive
M
e
d
i
a
:
A
ppr
oac
h
e
s
a
n
d
P
e
rsp
ective
s
fr
o
m
AI
(
E
SS
E
M
2013
)
, T
ur
i
n
, Italy, 3 Dece
m
be
r
2013.
56.Tabo
r
d
a,
B
.
;
d
e Al
m
ei
d
a, A.
;
Dia
s
, J.C.
;
B
ati
s
ta, F.
;
Ribei
r
o, R.
S
tock
M
a
r
ket T
w
eet
s
Data. IEEE D
a
t
ap
o
r
t 2021.
[
C
r
o
ss
Re
f]
57.
B
alaji,
P
.
;
Na
g
a
r
aj
u
,
O
.
;
Ha
r
it
h
a, D. L
e
v
e
l
s
o
f
S
e
n
ti
m
e
n
t A
n
aly
s
i
s
a
n
d
it
s
C
h
all
e
n
g
e
s
:
A Lit
e
r
at
ur
e
R
e
vi
e
w
. I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e
I
n
te
r
na
tio
na
l Co
nf
e
r
e
nc
e o
f
B
ig D
a
t
a
A
na
lyti
cs
an
d
Co
m
pu
t
a
tio
na
l I
n
tellige
nc
e
(
IC
B
DAC
)
, C
h
i
r
a
l
a
, I
n
d
i
a
, 23–25
Ma
r
ch
2017
;
pp
.
400–403.
58. C
h
a
w
la, N.V.
;
B
o
w
ye
r
, K.
W
.
;
Hall, L.
O
.
;
Ke
g
el
m
eye
r
,
W
.
P
.
SM
O
TE
:
S
y
n
t
h
etic
M
i
n
o
r
ity
O
ve
r-s
a
mp
li
n
g
Tec
hn
i
q
u
e. J. A
r
ti
f
. I
n
tell.
Res. 2002, 16, 321–357.
[
C
r
o
ss
Re
f]
59.H
e
, H.
;
B
ai, Y.
;
G
a
r
cia, E.A.
;
Li,
S
. ADA
S
YN
:
A
d
a
p
t
iv
e
s
y
n
t
h
et
ic
s
a
mp
li
n
g
a
ppr
oac
h
f
o
r
i
m
bala
n
c
e
l
e
a
rn
i
n
g
. I
n
Pr
oc
ee
d
i
n
g
s
o
f
t
h
e
2008 IEEE I
n
te
rn
atio
n
al Co
nf
e
r
e
n
ce o
n
Ne
ur
al Net
w
o
r
k
s
(
IJCNN 2008
)
, Ho
n
g
Ko
n
g
, C
h
i
n
a, 1–8 J
un
e 2008
;
pp
. 1322–1328.
60.Li, X.
;
W
a
n
g
, X.
;
Li
u
, H. R
e
s
e
a
r
c
h
o
n
fi
n
e
-
t
un
i
n
g
s
t
r
at
e
g
y o
f
s
e
n
ti
m
e
n
t a
n
aly
s
i
s
m
o
d
e
l ba
s
e
d
o
n
B
ERT. I
n
P
r
oc
ee
d
i
n
g
s
o
f
t
h
e
2021
IEEE 3
rd
I
n
te
rn
a
t
io
n
al Co
nf
e
r
e
n
c
e
o
n
Co
mmun
ica
t
io
ns
, I
nf
o
rm
a
t
io
n
S
y
s
te
m
a
nd
Co
mpu
te
r
E
n
g
i
n
ee
r
i
n
g
(
CI
S
CE
)
,
B
e
iji
n
g
, C
h
i
n
a, 14–
16
M
ay 2021
;
pp
. 798–802.
61.
P
o
p
el,
M
.
;
B
oja
r
,
O
. T
r
ai
n
i
n
g
Ti
ps
f
o
r
t
h
e T
r
a
nsf
o
rm
e
r
M
o
d
el.
a
r
Xi
v
2018, a
r
Xiv
:
1804.00247. Available o
n
li
n
e
:
h
tt
ps
:
//a
r
x
iv.o
r
g
/
p
d
f
/1804.00247.
p
d
f
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
62. A
ma
zo
n
W
e
b
S
e
r
vi
c
e
s
. A
ma
zo
n
Co
m
pr
e
h
e
n
d
:
Fe
a
t
ur
e
s
. Av
a
il
ab
le o
n
li
n
e
:
h
tt
p
s
:
//
a
w
s
.
ama
zo
n
.
c
o
m
/
c
o
m
pr
e
h
e
n
d
/
f
e
a
t
ur
e
s
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
63.A
m
azo
n
W
e
b
S
e
r
vic
e
s
. A
m
azo
n
Co
mpr
e
h
e
nd
D
e
v
e
lo
p
e
r
G
u
i
d
e
. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
d
oc
s
.a
w
s
.a
m
azo
n
.co
m
/co
mpr
e
h
e
nd
/
late
s
t/
dg
/co
mpr
e
h
e
n
d
-
dg
.
p
d
f
.
h
o
w
-s
e
n
ti
m
e
n
t
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
64.O
p
e
n
AI, G
P
T
-
3.5 T
ur
b
o. Av
a
il
ab
le o
n
li
n
e
:
h
tt
p
s
:
//
p
l
a
t
f
o
r
m
.o
p
e
na
i.
c
o
m
/
d
o
cs
/
m
o
d
el
s
/g
p
t
-
3
-
5
-
t
ur
b
o
(acc
e
ss
e
d
o
n
15
Ma
r
ch
2024
)
.
B
ig D
a
t
a
C
o
gn. C
o
m
p
ut. 2024, 8, 87
21 o
f
21
65.I
BM
Clo
u
d
A
P
I Doc
s
:
Nat
ur
al La
n
g
u
a
g
e
U
n
d
e
rs
ta
n
d
i
n
g
. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//clo
u
d
.ib
m
.co
m
/a
p
i
d
oc
s
/
n
at
ur
al
-
la
n
g
u
a
g
e
-
un
d
e
rs
ta
n
d
i
n
g
?co
d
e=
p
yt
h
o
n
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
66.I
BM
.
Wa
t
s
o
n
N
a
t
ur
a
l L
an
g
u
a
ge U
n
d
e
r
s
t
an
d
i
n
g
:
Fe
a
t
ur
e
s
. Av
a
il
ab
le o
n
li
n
e
:
h
tt
p
s
:
//
www
.i
bm
.
c
o
m
/
c
lo
ud
/
w
a
t
s
o
n-na
t
ur
a
l
-
la
n
g
u
a
g
e
-un
d
e
rs
ta
n
d
i
n
g
/
d
etail
s
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
67.
S
e
n
ti
S
t
r
e
n
g
t
h
. Available o
n
li
n
e
:
h
tt
p
:
//
s
e
n
ti
s
t
r
e
n
g
t
h
.
w
lv.ac.
u
k/
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
68.Hoa
n
g
,
M
.
;
B
i
h
o
r
ac,
O
. A.
;
Ro
u
c
e
s, J. As
p
e
c
t
-B
as
e
d
S
e
n
t
i
m
e
n
t
A
n
alysis
u
si
n
g
B
ERT. I
n
Pr
oc
ee
d
i
n
g
s o
f
t
h
e
22
nd
No
rd
ic Co
nf
e
r
e
n
c
e
o
n
Co
mpu
t
a
t
io
n
al Li
n
g
u
is
t
ics, T
ur
k
u
,
F
i
n
la
nd
, 30
S
e
p
te
m
b
e
r
–2
O
c
t
ob
e
r
2019
;
pp
. 187–196. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//acla
n
t
h
olo
g
y.
o
r
g
/
W
19
-
6120/
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
69.Goe
r
tzel,
B
. Ge
n
e
r
a
tive AI v
s
. AGI
:
T
h
e Cog
n
itive
S
t
r
e
n
gt
hs
an
d
W
e
a
k
n
e
ss
e
s
o
f
M
o
d
e
r
n
LL
Ms
. 2023. Av
a
il
ab
le o
n
li
n
e
:
h
tt
ps
:
//a
r
x
iv.o
r
g
/
p
d
f
/2309.10371.
p
d
f
(
acce
ss
e
d
o
n
25 J
un
e 2022
)
.
70. Ra
hu
to
m
o, F.
;
Kita
su
ka, T.
;
A
r
it
su
g
i,
M
.
S
e
m
a
n
tic Co
s
i
n
e
S
i
m
ila
r
ity. I
n
P
r
ocee
d
i
n
g
s
o
f
t
h
e 7t
h
I
n
te
rn
atio
n
al
S
t
u
d
e
n
t Co
nf
e
r
e
n
ce
o
n
A
d
va
n
c
e
d
S
ci
e
n
c
e
a
nd
T
e
c
hn
olo
g
y,
S
e
o
u
l, R
e
pu
blic o
f
Ko
r
e
a, 29–30
O
c
t
ob
e
r
2012. Availabl
e
o
n
li
n
e:
h
tt
ps
:
//
www
.
r
e
s
e
a
r
c
h
g
a
te
.
n
et/
pu
blicatio
n
/262525676_
S
e
m
a
n
tic_Co
s
i
n
e_
S
i
m
ila
r
ity
(
acce
ss
e
d
o
n
30 Nove
m
be
r
2021
)
.
71.No
r
a Raj
u
, T.
;
Ra
h
a
n
a,
P
.A.
;
M
o
n
cy, R.
;
Ajay,
S
.
;
Na
m
bia
r
,
S
.K.
S
e
n
te
n
c
e
S
i
m
ila
r
i
t
y
A
S
t
a
te
o
f
A
r
t
A
ppr
oac
h
e
s
. I
n
Pr
oc
ee
d
i
n
g
s
o
f
t
h
e
I
n
te
rn
a
t
io
n
al Co
nf
e
r
e
n
c
e
o
n
Co
mpu
t
i
n
g
, Co
mmun
ica
t
io
n
,
S
e
c
ur
i
t
y a
nd
I
n
te
lli
g
e
n
t
S
y
s
te
ms
(
IC3
S
I
S)
, Koc
h
i, I
nd
ia, 23–25 J
un
e
2022
;
pp
. 1–6.
Di
s
c
l
a
i
m
e
r
/P
ub
li
sh
e
r
s
N
o
te
:
T
h
e
s
t
a
te
m
e
n
t
s
, o
p
i
n
io
ns
an
d
d
a
t
a
c
o
n
t
a
i
n
e
d
i
n
a
ll
pu
b
li
ca
tio
ns
a
r
e
s
olely t
h
o
s
e o
f
t
h
e i
n
d
ivi
du
a
l
a
u
t
h
o
r(s)
a
nd
co
n
t
r
ib
u
t
o
r(s)
a
nd
n
o
t
o
f
M
D
P
I a
nd
/o
r
t
h
e
e
d
i
t
o
r(s)
.
M
D
P
I a
nd
/o
r
t
h
e
e
d
i
t
o
r(s)
d
i
s
clai
m
r
e
sp
o
ns
ibili
t
y
f
o
r
a
n
y i
n
j
ur
y
t
o
p
eo
p
le o
r
pr
o
p
e
r
ty
r
e
su
lti
n
g
fr
o
m
a
n
y i
d
ea
s
,
m
et
h
o
d
s
, i
ns
t
ru
ctio
ns
o
r
pr
o
d
u
ct
s
r
e
f
e
rr
e
d
to i
n
t
h
e co
n
te
n
t.